FAIRmat-NFDI / nexus_definitions

Definitions of the NeXus Standard File Structure and Contents
https://manual.nexusformat.org/
Other
5 stars 8 forks source link

Bugs in NXmpes definition #230

Closed rettigl closed 2 days ago

rettigl commented 1 month ago

There seem to be several issues with both the NXmpes definitions, as well as with the parsing code in the latest version of pynxtools

rettigl commented 1 month ago

The second part is addressed in PR #128, so only the former remains (probably pynxtools-related?)

lukaspie commented 1 month ago

There seem to be several issues with both the NXmpes definitions, as well as with the parsing code in the latest version of pynxtools

  • The field "kinetic_energy" seems to have slipped out of the definitions again:
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy/@units written without documentation. 

I think NXenergydispersion]/kinetic_energy is so far only in the NXmpes_arpes PR that has not been merged. I have created a separate PR that fixes it independently.

lukaspie commented 1 month ago

The second part is addressed in PR #128, so only the former remains (probably pynxtools-related?)

Sorry, I did not see your comment. Feel free to merge the separate PR first or we merge it with #128.

lukaspie commented 1 month ago
  • I get this line twice:
The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/transformations/scheme is required and hasn't been supplied by the reader.
The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/transformations/scheme is required and hasn't been supplied by the reader.

I don't really understand this error. There is no named field transformations in either NXmpes/NXinstrument/NXelectronanalyser or in the base class NXmpes. I guess you are working with NXmpes_arpes from #128? But even there, there is no scheme inside /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/transformations, so I don't even get where the error is coming from.

rettigl commented 1 month ago

I don't really understand this error. There is no named field transformations in either NXmpes/NXinstrument/NXelectronanalyser or in the base class NXmpes. I guess you are working with NXmpes_arpes from #128? But even there, there is no scheme inside /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/transformations, so I don't even get where the error is coming from.

Me neither. I just used the latest definitions installed by pynxtools. I will look into this when I find time.

rettigl commented 1 month ago

This bug must be somewhere in the validation code. However, I don't understand it well enough...

rettigl commented 1 month ago

The issue is somewhere here: https://github.com/FAIRmat-NFDI/pynxtools/blob/71020b564b88e33a8e50770315b6cfa63ffea31e/pynxtools/dataconverter/validation.py#L323 For grafik I get grafik which does not make sense to me. It appears it is due to this line: https://github.com/FAIRmat-NFDI/pynxtools/blame/71020b564b88e33a8e50770315b6cfa63ffea31e/pynxtools/dataconverter/validation.py#L201 The problem here is that transformations is defined as "transformations" and not as "TRANSFORMATIONS[transformations]", as it should be. Thus, it is not in "get_all_children_names", and is added even though get_nx_namefit returns 0. As I don't understand the logic here really, I cannot debug further.

So, it makes sense that the code complains (as lower_case transformations is not defined), but the error message is very unintuitive and misleading.

rettigl commented 1 month ago

If I try to use the inherited appdef NXmpes_arpes, things become really crazy:

The value at /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/entrance_slit/shape should be on of the following strings: ['straight slit', 'curved slit']
The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/depends_on is required and hasn't been supplied by the reader.
The required group, /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/transformations, hasn't been supplied.
The data entry corresponding to /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/depends_on is required and hasn't been supplied by the reader.
The required group, /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/transformations, hasn't been supplied.
Missing attribute: "/ENTRY[entry]/data/@energy_indices"
Missing attribute: "/ENTRY[entry]/data/@angular0_indices"
Missing attribute: "/ENTRY[entry]/data/@angular1_indices"
Missing attribute: "/ENTRY[entry]/data/@angular0_depends"
Missing attribute: "/ENTRY[entry]/data/@angular1_depends"
Missing attribute: "/ENTRY[entry]/data/@energy_depends"
The data entry corresponding to /ENTRY[entry]/data/energy is required and hasn't been supplied by the reader.
There were attributes set for the field /ENTRY[entry]/data/energy, but the field does not exist.
The data entry corresponding to /ENTRY[entry]/data/angular0 is required and hasn't been supplied by the reader.
The data entry corresponding to /ENTRY[entry]/data/angular1 is required and hasn't been supplied by the reader.
Field /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/resolution written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/resolution/@units = meV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/resolution/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/physical_quantity written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/energy_resolution/type written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/name written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/measurement written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/value written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/value/@units = mbar, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/pressure_gauge/value/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/device_information/vendor written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/device_information/model written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/kinetic_energy/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/ELECTRONANALYSER[electronanalyser]/ENERGYDISPERSION[energydispersion]/exit_slit/shape written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/sourceTYPE[source_probe]/associated_beam written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/distance written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/distance/@units = mm, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/distance/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_energy written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_energy/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_energy/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_energy_spread written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_energy_spread/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_energy_spread/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/pulse_duration written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/pulse_duration/@units = fs, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/pulse_duration/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_polarization written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_polarization/@units = V^2/mm^2, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/incident_polarization/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/extent written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/extent/@units = µm, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/extent/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_probe]/associated_source written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/sourceTYPE[source_pump]/associated_beam written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/distance written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/distance/@units = mm, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/distance/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_energy written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_energy/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_energy/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_energy_spread written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_energy_spread/@units = eV, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_energy_spread/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_wavelength written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_wavelength/@units = nm, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_wavelength/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/pulse_duration written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/pulse_duration/@units = fs, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/pulse_duration/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_polarization written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_polarization/@units = V^2/mm^2, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/incident_polarization/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/pulse_energy written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/pulse_energy/@units = µJ, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/pulse_energy/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/average_power written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/average_power/@units = mW, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/average_power/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/extent written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/extent/@units = µm, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/extent/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/fluence written without documentation.
The unit, /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/fluence/@units = mJ/cm^2, is being written but has no documentation
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/fluence/@units written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/beamTYPE[beam_pump]/associated_source written without documentation.
Field /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/temperature_sensor/name requires a unit in the unit category NX_TRANSFORMATION.
The value at /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/temperature_sensor/name should be one of: (<class 'int'>, <class 'float'>, <class 'numpy.ndarray'>, <class 'numpy.signedinteger'>, <class 'numpy.unsignedinteger'>, <class 'numpy.floating'>, <class 'dict'>), as defined in the NXDL as NX_NUMBER.
Field /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/temperature_sensor/measurement requires a unit in the unit category NX_TRANSFORMATION.
The value at /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/temperature_sensor/measurement should be one of: (<class 'int'>, <class 'float'>, <class 'numpy.ndarray'>, <class 'numpy.signedinteger'>, <class 'numpy.unsignedinteger'>, <class 'numpy.floating'>, <class 'dict'>), as defined in the NXDL as NX_NUMBER.
Field /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/sample_bias_voltmeter/name requires a unit in the unit category NX_TRANSFORMATION.
The value at /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/sample_bias_voltmeter/name should be one of: (<class 'int'>, <class 'float'>, <class 'numpy.ndarray'>, <class 'numpy.signedinteger'>, <class 'numpy.unsignedinteger'>, <class 'numpy.floating'>, <class 'dict'>), as defined in the NXDL as NX_NUMBER.
Field /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/sample_bias_voltmeter/measurement requires a unit in the unit category NX_TRANSFORMATION.
The value at /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/sample_bias_voltmeter/measurement should be one of: (<class 'int'>, <class 'float'>, <class 'numpy.ndarray'>, <class 'numpy.signedinteger'>, <class 'numpy.unsignedinteger'>, <class 'numpy.floating'>, <class 'dict'>), as defined in the NXDL as NX_NUMBER.
Field /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/drain_current_amperemeter/name requires a unit in the unit category NX_TRANSFORMATION.
The value at /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/drain_current_amperemeter/name should be one of: (<class 'int'>, <class 'float'>, <class 'numpy.ndarray'>, <class 'numpy.signedinteger'>, <class 'numpy.unsignedinteger'>, <class 'numpy.floating'>, <class 'dict'>), as defined in the NXDL as NX_NUMBER.
Field /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/drain_current_amperemeter/measurement requires a unit in the unit category NX_TRANSFORMATION.
The value at /ENTRY[entry]/INSTRUMENT[instrument]/MANIPULATOR[manipulator]/drain_current_amperemeter/measurement should be one of: (<class 'int'>, <class 'float'>, <class 'numpy.ndarray'>, <class 'numpy.signedinteger'>, <class 'numpy.unsignedinteger'>, <class 'numpy.floating'>, <class 'dict'>), as defined in the NXDL as NX_NUMBER.
Field /ENTRY[entry]/SAMPLE[sample]/temperature/temperature_sensor written without documentation.
Field /ENTRY[entry]/SAMPLE[sample]/gas_pressure/pressure_gauge written without documentation.

To me it appears this validation code has not been thoroughly tested enough...

domna commented 1 month ago

The problem here is that transformations is defined as "transformations" and not as "TRANSFORMATIONS[transformations]", as it should be. Thus, it is not in "get_all_children_names", and is added even though get_nx_namefit returns 0.

Indeed that is a problem with the algorithm. It goes through all required fields and tries to find a namefit the available names and they are only considered if they are not in all the children names (here including all inherited ones). This means that it also can namefit a single name twice if it matches to multiple categories (including the consequences of missing fields one level down). For this we need to resolve the ambiguity by stating it with the class as you correctly found. I think we can improve the algorithm for this in the future but for the first basic version we need to resolve the ambiguity by stating the class.

A return value of 0 for get_nx_namefit means that it fits the category but has nothing in common. -1 means no fit. Here is the docstring of the function explaining it in more detail: https://github.com/FAIRmat-NFDI/nexus_definitions/blob/ea6b7b2210c89bc1ee91b696af51991a779c20c8/dev_tools/utils/nxdl_utils.py#L112-L147

So, it makes sense that the code complains (as lower_case transformations is not defined), but the error message is very unintuitive and misleading.

I agree. Generally, all error messages and reporting for the new verification have room for improvement. I plan to do improvements with the cli #validation and hdf tree traversal: https://github.com/FAIRmat-NFDI/pynxtools/pull/333

To me it appears this validation code has not been thoroughly tested enough...

Indeed, we brought this quickly with only a small testing set on our examples. But the old algorithm was broken anyways, was slow and also did not verify everything (especially multiple groups belonging to the same concept) so we decided to bring it in already and then fix all the upcoming issues with the new algorithm.

Your long list of problems, I believe, arise from the fact that extends is not yet supported at all. Therefore, the algorithm just works on NXmpes_arpes without any fields inherited from the parent. I will implement support for this in the next days and test whether this runs through without any errors (probably there are also some other small issues then, especially NXdata specialities are still very hacky and untested).

domna commented 1 month ago

@rettigl I started to work on supporting the extends keyword here: https://github.com/FAIRmat-NFDI/pynxtools/pull/339

Which example did you use for the above tests? Then I'll use this for testing and debugging.

rettigl commented 1 month ago

Which example did you use for the above tests? Then I'll use this for testing and debugging.

This was the conversion of the phoibos scan to NXmpes_arpes, which I did on the central Nomad, with updated packages in the jupyterlab container

lukaspie commented 2 days ago

Closed as handled by https://github.com/FAIRmat-NFDI/pynxtools/pull/339