FAIRmat-NFDI / nyaml

https://fairmat-nfdi.github.io/nyaml
https://pypi.org/project/nyaml/
Apache License 2.0
1 stars 0 forks source link

Dimensions in NXDL always get a rank #27

Open lukaspie opened 4 months ago

lukaspie commented 4 months ago
lukaspie commented 4 months ago

@domna @RubelMozumder I made some improvements to the conversion that I outlined above and enhanced the test cases. We should keep the tests in any case even if we don't merge the changes to the conversion itself.

This seems to be work now, except for one case (which I also added to the tests): https://github.com/FAIRmat-NFDI/nyaml/blob/938713dd1bc193d709ee278fd8c467b6883426c6/tests/data/dimensions.nxdl.xml#L78

<field name="my_dim_without_value" type="NX_NUMBER">
      <dimensions>
          <doc>
              Test with dimension, dim, and no value,
              as in NXsensor/value_deriv2.
          </doc>
          <dim index="1" ref="value"/>
      </dimensions>
  </field>

This is directly adapated from NXsensor, see here how it renders in the documentation.

I have now idea how we should convert this back and forth. I have also added a test case for the other direction: https://github.com/FAIRmat-NFDI/nyaml/blob/11ccb71e46218472cc5ef974d90572684cda72ce/tests/data/dim_keyword.yaml#L32

This is btw an example that you don't actually HAVE to write a rank inside dimensions.

Any thoughts/ideas?

domna commented 4 months ago

@domna @RubelMozumder I made some improvements to the conversion that I outlined above and enhanced the test cases. We should keep the tests in any case even if we don't merge the changes to the conversion itself.

...

Any thoughts/ideas?

I think we might need to go one step back and define what we actually want. I tried to dig into your code and wanted to look up what each of the nxdl attributes mean. I discovered that ref is deprecated and should actually not be used anymore, so we should not generate it. This is even stated as deprecated attributes in our code (which is fine for nxdl 2 nyaml but doesn't make sense for nyaml 2 nxdl): https://github.com/FAIRmat-NFDI/nyaml/blob/11ccb71e46218472cc5ef974d90572684cda72ce/nyaml/nyaml2nxdl.py#L502

Here is the xsd reference

I would argue that our generated nxdl should always look like this (ref):

<dimensions rank="2">
  <dim index="1" value="nsurf"/>
  <dim index="2" value="nwl"/>
</dimensions>

where the value can either be a symbolic link or a fixed (integer) length of the array. I would also argue that it's fine that we generate exactly this even if the original nxdl contains deprecated attributes (i.e. we don't reproduce deprecated attributes). We can also discuss whether we want to leave out the rank or not per default.

Also when I generate the field you mention I get

<field name="my_dim_without_value" type="NX_NUMBER">
            <dimensions>
                <doc>
                     Test with dimension, dim, and no value,
                     as in NXsensor/value_deriv2.
                </doc>
                <dim index="1" value="[1]" ref="value"/>
            </dimensions>
</field>

which should actually be

<field name="my_dim_without_value" type="NX_NUMBER">
            <dimensions rank="1">
                <doc>
                     Test with dimension, dim, and no value,
                     as in NXsensor/value_deriv2.
                </doc>
                <dim index="1" value="value" />
            </dimensions>
</field>

if we follow the example above (which is also different from the reference!).

RubelMozumder commented 4 months ago

I see the issue mainly comes make local which expects to have a rank in the dimension attribute along with the dim element. Which might be the new test requirement set on make local command.

@domna @RubelMozumder I made some improvements to the conversion that I outlined above and enhanced the test cases. We should keep the tests in any case even if we don't merge the changes to the conversion itself. ... Any thoughts/ideas?

I think we might need to go one step back and define what we actually want. I tried to dig into your code and wanted to look up what each of the nxdl attributes mean. I discovered that ref is deprecated and should actually not be used anymore, so we should not generate it. This is even stated as deprecated attributes in our code (which is fine for nxdl 2 nyaml but doesn't make sense for nyaml 2 nxdl):

https://github.com/FAIRmat-NFDI/nyaml/blob/11ccb71e46218472cc5ef974d90572684cda72ce/nyaml/nyaml2nxdl.py#L502

Here is the xsd reference

I would argue that our generated nxdl should always look like this (ref):

<dimensions rank="2">
  <dim index="1" value="nsurf"/>
  <dim index="2" value="nwl"/>
</dimensions>

where the value can either be a symbolic link or a fixed (integer) length of the array. I would also argue that it's fine that we generate exactly this even if the original nxdl contains deprecated attributes (i.e. we don't reproduce deprecated attributes). We can also discuss whether we want to leave out the rank or not per default.

Also when I generate the field you mention I get

<field name="my_dim_without_value" type="NX_NUMBER">
            <dimensions>
                <doc>
                     Test with dimension, dim, and no value,
                     as in NXsensor/value_deriv2.
                </doc>
                <dim index="1" value="[1]" ref="value"/>
            </dimensions>
</field>

which should actually be

<field name="my_dim_without_value" type="NX_NUMBER">
            <dimensions rank="1">
                <doc>
                     Test with dimension, dim, and no value,
                     as in NXsensor/value_deriv2.
                </doc>
                <dim index="1" value="value" />
            </dimensions>
</field>

if we follow the example above (which is also different from the reference!).

I think having rank in the dimension element by default is something added in the test for make local command within the last few months. As per my knowledge, it was completely up to application developer.

However, we must think about the nyaml version publication in PyPi. I suggest not to publish by default to check the nyaml on the nexus-definition repo up to local deployment then publication. Another way could be having develop branch and where all the PR will be merged there and for publishing one new version develop goes to the master branch.

lukaspie commented 4 months ago

@domna @RubelMozumder I made some improvements to the conversion that I outlined above and enhanced the test cases. We should keep the tests in any case even if we don't merge the changes to the conversion itself. ... Any thoughts/ideas?

I think we might need to go one step back and define what we actually want. I tried to dig into your code and wanted to look up what each of the nxdl attributes mean. I discovered that ref is deprecated and should actually not be used anymore, so we should not generate it. This is even stated as deprecated attributes in our code (which is fine for nxdl 2 nyaml but doesn't make sense for nyaml 2 nxdl):

https://github.com/FAIRmat-NFDI/nyaml/blob/11ccb71e46218472cc5ef974d90572684cda72ce/nyaml/nyaml2nxdl.py#L502

Here is the xsd reference

I would argue that our generated nxdl should always look like this (ref):

<dimensions rank="2">
  <dim index="1" value="nsurf"/>
  <dim index="2" value="nwl"/>
</dimensions>

where the value can either be a symbolic link or a fixed (integer) length of the array. I would also argue that it's fine that we generate exactly this even if the original nxdl contains deprecated attributes (i.e. we don't reproduce deprecated attributes). We can also discuss whether we want to leave out the rank or not per default.

I agree that this is the way that it should be done. However, a large motivation of Rubel's last PR was to not change the existing NXDL definitions (especially those in base/application) so that the NIAC-accepted files don't get overwritten. This is one of the reasons why we have such a mess right now in the code because we also try to cover all the edge cases, e.g. even the deprecated attributes.

Also when I generate the field you mention I get

<field name="my_dim_without_value" type="NX_NUMBER">
            <dimensions>
                <doc>
                     Test with dimension, dim, and no value,
                     as in NXsensor/value_deriv2.
                </doc>
                <dim index="1" value="[1]" ref="value"/>
            </dimensions>
</field>

which should actually be

<field name="my_dim_without_value" type="NX_NUMBER">
            <dimensions rank="1">
                <doc>
                     Test with dimension, dim, and no value,
                     as in NXsensor/value_deriv2.
                </doc>
                <dim index="1" value="value" />
            </dimensions>
</field>

if we follow the example above (which is also different from the reference!).

Exactly, but this is not due to the changes in this PR, but due to #24. In fact, when I backwards-convert the currently generated xml, i.e.,

<field name="my_dim_without_value" type="NX_NUMBER">
    <dimensions>
        <doc>
            Test with dimension, dim, and no value,
            as in NXsensor/value_deriv2.
        </doc>
        <dim index="1" value="[1]" ref="value"/>
    </dimensions>
</field>

I get

my_dim_without_value(NX_NUMBER):
  dimensions:
    doc: |
      Test with dimension, dim, and no value,
      as in NXsensor/value_deriv2.
    dim: ([1],)
    dim_parameters:
      ref: ['value']

which is completely different to what was in the original nyaml.

lukaspie commented 4 months ago

However, we must think about the nyaml version publication in PyPi. I suggest not to publish by default to check the nyaml on the nexus-definition repo up to local deployment then publication. Another way could be having develop branch and where all the PR will be merged there and for publishing one new version develop goes to the master branch.

I agree, we should always first check that the newest nyaml does not actually break our definitions before publishing to PyPi. If the test cases were all-encompassing, that wouldn't be a problem, but for the latest release, it clearly was. I shouldn't have published this package at that point, that was probably my fault.

lukaspie commented 4 months ago

My suggestion is that we start with well-defined test files that a) define what we want, b) cover edge cases, and c) don't break the existing definitions, and re-design some of the tool from there. I tried to do so with the test files that were changed in this PR:

These tests are also specifically designed to NOT break the existing definitions, especially the edge cases for NXdata and NXsensor.

Currently, the tests are breaking in the edge cases. nyaml2nxdl forward test in dim_keyword.yaml

nyaml2nxdl backward test in dimensions.nxdl.xml

Also tagging @sanbrock @mkuehbach so they are aware of the discussion here.