FAIRmat-NFDI / pynxtools

https://fairmat-nfdi.github.io/pynxtools/
Apache License 2.0
13 stars 8 forks source link

Optional link in multireader #472

Open lukaspie opened 1 week ago

lukaspie commented 1 week ago

This allows to write something like

"ENTRY/SAMPLE[sample]/temperature_env/"temperature_sensorsample_heater": "@link:!/entry/instrument/manipulator/sample_heater",`

in the config file. The ! before the link target tells the reader that this is an optional link to another concept within the file, i.e., if the path entry/instrument/manipulator/sample_heater (entry may be replaced on the way) does not exist, the link will be removed from the Template and there will not be a BrokenLink Validation error and no warning in the writer.

sherjeelshabih commented 1 week ago

Could we have this as default behavior without an extra annotation? What's stopping us from having this as default? If something cannot be linked, especially in the same file cuz this can be checked like you do here, then it should be removed anyways, no?

lukaspie commented 1 week ago

Could we have this as default behavior without an extra annotation? What's stopping us from having this as default? If something cannot be linked, especially in the same file cuz this can be checked like you do here, then it should be removed anyways, no?

Yeah, that makes total sense, I implemented that.

For links, there was however anyway still a problem still with the ! notation at the beginning of the value string. As a reminder, this notation is used to handle the situation that there is an optional NeXus group in an application definition, that (if implemented) requires some sub-element. So you can write

"ENTRY/INSTRUMENT[instrument]/energy_resolution": {
  "resolution": "@attrs:metadata/instrument/electronanalyser/energy_resolution",
  "resolution/@units": "meV",
  "physical_quantity": "energy"
}

and if the metadata/instrument/electronanalyser/energy_resolution path is not resolvable, then the whole energy_resolution group is removed.

However, this happens before all other keys are touched. For links, we would like to have it that it removes the group if no link is found after the dict has been filled (using e.g. "!@link:entry/instrument/name") because only then can we even check the link. I have implemented it like this here, but that unfortunately means we have to iterate the dict a second time after everything has been filled to remove those groups that had a link with "!" notation.

Also tagging here @rettigl since he is using that notation a lot.

RonHildebrandt commented 5 days ago

So a tradeoff would be, that we have to scan it a second time, to make it automatically. Via the "!" notation, this is simpler?

The option itself sounds useful.

Not sure, if you want to add it as default (without a potential future user being aware of that), as it add an additional "rule", you just have to know. Not sure if otherwise some people woul be wondering why some stuff is now disappearing? Is this only really the case for links?

If the ! notation is used extremely frequently, then one could make this automatically, otherwise not immedeately necessary?

Not sure, just my thaughts on this. As I just begin to work with all this, maybe my way to think does not make sense for this case.

lukaspie commented 5 days ago

@RonHildebrandt so the idea is here the following: 1) If a link is not resolvable, we immediately delete it from the template instead of checking later in the validation and writing stages (this is what @sherjeelshabih suggested). Note that this is different to what is suggested in the initial description of the PR, but for me it makes total sense. This also doesn't have to anything to do with the ! notation. 2) The ! notation already exists for other prefixes (like @attrs, @eln) and indicates that if this prefix is not resolvable (i.e., there is no value found for this field) and this is a required sub-element in an optional NeXus group, we remove the entire group. We initally implemented it such that the config values that used such a ! notation were checked at the beginning and then we didn't need to check the other element of that optional group if that field was not fillable. Note that here no second looping of the large dict is needed. 3) The new thing about the ! notation here is that it now also works for links. However, as links can obviously only be tested at the end (after all other keys have been filled), we need to iterate the whole template once again and check that such links (i.e., values in the config starting with !@link) can be resolved and if not, remove these groups.

I think this double looping over the template is fine, I doubt this !@link notation will be used a lot. I didn't see much performance degradation and I used it ~ 15 times in one config file.