FAIRmat-NFDI / nyaml

https://fairmat-nfdi.github.io/nyaml
https://pypi.org/project/nyaml/
Apache License 2.0
1 stars 0 forks source link

nyaml supports both dimensions and dim #24

Closed RubelMozumder closed 4 months ago

RubelMozumder commented 4 months ago

Conversions from yaml to nxdl and vice-versa.

dim: (4,5)
<dimensions>
    <dim index=1 value=4/>
    <dim index=2 value=5/>
<dimensions/>

Note, the <dimension> does not have any rank which is intended. To keep the consistency (which was also part motivation of nyaml2nxdl tools) a of the nxdl.xml file content, e.g. if a nxdl.xml file dimension element does not have any rank key while reproducing the nxdl.xml the dimensions element will be as it was.

RubelMozumder commented 4 months ago

Apart from some:

  • [x] typos that should be cleaned plus
  • [x] some commented out code that should be removed

Nothing immediately critical in my opinion which should block the merging to get the writing of the nyaml publication supported. Thank you @RubelMozumder for having that missing piece added

Thanks, for pointing out two issues. I will check them.

lukaspie commented 4 months ago

I have a more general question about the conversion strategy.

Let's say we start in the nxdl with something like this:

<field name="kinetic_energy" type="NX_FLOAT">
    <dimensions>
        <dim index="1" value="n_transmission_function"/>
    </dimensions>
</field>
<field name="relative_intensity" type="NX_FLOAT">
    <dimensions rank="1">
        <dim index="1" value="n_transmission_function"/>
    </dimensions>
</field>

This gets translated to

kinetic_energy(NX_FLOAT):
  dim: (n_transmission_function,)
relative_intensity(NX_FLOAT):
  dimensions:
    rank: 1
    dim: [[1, n_transmission_function]]

I guess here it picks up that the <dimensions> in kinetic_energy don't have the rank specified and therefore the conversion defaults to the numpy-notation. I am curious about the second case (i.e., transmission_function). My idea in the issue (#22) was that we always use the short notation with the dim keyword when translating back to nyaml. But now, it decides based on what is there in the XML element. What is the strategy here?

RubelMozumder commented 4 months ago

I have a more general question about the conversion strategy.

Let's say we start in the nxdl with something like this:

<field name="kinetic_energy" type="NX_FLOAT">
    <dimensions>
        <dim index="1" value="n_transmission_function"/>
    </dimensions>
</field>
<field name="relative_intensity" type="NX_FLOAT">
    <dimensions rank="1">
        <dim index="1" value="n_transmission_function"/>
    </dimensions>
</field>

This gets translated to

kinetic_energy(NX_FLOAT):
  dim: (n_transmission_function,)
relative_intensity(NX_FLOAT):
  dimensions:
    rank: 1
    dim: [[1, n_transmission_function]]

I guess here it picks up that the <dimensions> in kinetic_energy don't have the rank specified and therefore the conversion defaults to the numpy-notation. I am curious about the second case (i.e., transmission_function). My idea in the issue (#22) was that we always use the short notation with the dim keyword when translating back to nyaml. But now, it decides based on what is there in the XML element. What is the strategy here?

The main strategy to keep all the version consistent for back and forth conversion which original decision came from the majority of area-b members. There are not only dim element, it also needs to consider dimensions element, its doc and its attributes.

lukaspie commented 4 months ago

I have a more general question about the conversion strategy. Let's say we start in the nxdl with something like this:

<field name="kinetic_energy" type="NX_FLOAT">
    <dimensions>
        <dim index="1" value="n_transmission_function"/>
    </dimensions>
</field>
<field name="relative_intensity" type="NX_FLOAT">
    <dimensions rank="1">
        <dim index="1" value="n_transmission_function"/>
    </dimensions>
</field>

This gets translated to

kinetic_energy(NX_FLOAT):
  dim: (n_transmission_function,)
relative_intensity(NX_FLOAT):
  dimensions:
    rank: 1
    dim: [[1, n_transmission_function]]

I guess here it picks up that the <dimensions> in kinetic_energy don't have the rank specified and therefore the conversion defaults to the numpy-notation. I am curious about the second case (i.e., transmission_function). My idea in the issue (#22) was that we always use the short notation with the dim keyword when translating back to nyaml. But now, it decides based on what is there in the XML element. What is the strategy here?

The main strategy to keep all the version consistent for back and forth conversion which original decision came from the majority of area-b members. There are not only dim element, it also needs to consider dimensions element, its doc and its attributes.

Ok, I must have missed that discussion about keeping old nyamls as they are now and not remake them with the new notation. If that's the majority decision, I am fine with the changes proposed here.

I still say that it would be nice if dimensions and dim would have the same capabilities, i.e., you could also set doc and other attributes with the dim notation but maybe this can be its own issue later.

sherjeelshabih commented 4 months ago

The main strategy to keep all the version consistent for back and forth conversion which original decision came from the majority of area-b members. There are not only dim element, it also needs to consider dimensions element, its doc and its attributes.

Ok, I must have missed that discussion about keeping old nyamls as they are now and not remake them with the new notation. If that's the majority decision, I am fine with the changes proposed here.

I still say that it would be nice if dimensions and dim would have the same capabilities, i.e., you could also set doc and other attributes with the dim notation but maybe this can be its own issue later.

I will also support having same functionality. It is less confusing. Maybe this can be improved here already. The only consistency we need to maintain is that we don't change the old NXDLs. We shouldn't have any problems changing our old yamls.

Are we still running nyaml2nxdl over all the files on commits? Or is that being done manually? If it's manual, then we should unify the conversion to what makes more sense.

lukaspie commented 4 months ago

I will also support having same functionality. It is less confusing. Maybe this can be improved here already. The only consistency we need to maintain is that we don't change the old NXDLs. We shouldn't have any problems changing our old yamls.

Are we still running nyaml2nxdl over all the files on commits? Or is that being done manually? If it's manual, then we should unify the conversion to what makes more sense.

It's done manually, but there is a check in the CI that the nyamls and nxdls are consistent.

For me, it's also totallly fine to change the nyamls, as of now they are literally just used for our own development and not part of the NeXus standard anyway.

RubelMozumder commented 4 months ago

I will also support having same functionality. It is less confusing. Maybe this can be improved here already. The only consistency we need to maintain is that we don't change the old NXDLs. We shouldn't have any problems changing our old yamls. Are we still running nyaml2nxdl over all the files on commits? Or is that being done manually? If it's manual, then we should unify the conversion to what makes more sense.

It's done manually, but there is a check in the CI that the nyamls and nxdls are consistent.

For me, it's also totallly fine to change the nyamls, as of now they are literally just used for our own development and not part of the NeXus standard anyway.

Thanks for sharing the link. I see the consistency only check nxdl files not for the nyaml file. Sorry, I missed that. Now I added the nxdl2yaml conversion for

dimensions:
   dim: (4,5,7)

and the corresponding test is also added.

lukaspie commented 4 months ago

Aside from the tests, you basically changed that nxdl2nyaml defaults to

dimensions:
 rank: 1
 dim: (n_transmission_function,)

Right? In that case, why don't we just write

dim: (n_transmission_function,)

immediately? That is even more concise.

My idea was to only default to the longer dimensions in the nyaml files if there is anything user than rank and <dim> in the nxdl. Like, if there is an adittional doc, then it should default to dimensions. What was your idea @sherjeelshabih?

lukaspie commented 4 months ago

Please also merge in the main branch, we fixed some issues in the config lint in #25 and it is probably not correctly checked in this PR as of yet.

sherjeelshabih commented 4 months ago

Aside from the tests, you basically changed that nxdl2nyaml defaults to

dimensions:
 rank: 1
 dim: (n_transmission_function,)

Right? In that case, why don't we just write

dim: (n_transmission_function,)

immediately? That is even more concise.

My idea was to only default to the longer dimensions in the nyaml files if there is anything user than rank and <dim> in the nxdl. Like, if there is an adittional doc, then it should default to dimensions. What was your idea @sherjeelshabih?

I was just agreeing with your idea. It's to use the shorter version wherever possible. Just as you said. I'm honestly not aware of the exact syntax.

RubelMozumder commented 4 months ago

How question

Aside from the tests, you basically changed that nxdl2nyaml defaults to

dimensions:
 rank: 1
 dim: (n_transmission_function,)

Right? In that case, why don't we just write

dim: (n_transmission_function,)

immediately? That is even more concise. My idea was to only default to the longer dimensions in the nyaml files if there is anything user than rank and <dim> in the nxdl. Like, if there is an adittional doc, then it should default to dimensions. What was your idea @sherjeelshabih?

I was just agreeing with your idea. It's to use the shorter version wherever possible. Just as you said. I'm honestly not aware of the exact syntax.

We all agreed to have the shorter version. This is already implemented, I will add the test @sherjeelshabih.

RubelMozumder commented 4 months ago

How question

Aside from the tests, you basically changed that nxdl2nyaml defaults to

dimensions:
 rank: 1
 dim: (n_transmission_function,)

Right? In that case, why don't we just write

dim: (n_transmission_function,)

immediately? That is even more concise. My idea was to only default to the longer dimensions in the nyaml files if there is anything user than rank and <dim> in the nxdl. Like, if there is an adittional doc, then it should default to dimensions. What was your idea @sherjeelshabih?

I was just agreeing with your idea. It's to use the shorter version wherever possible. Just as you said. I'm honestly not aware of the exact syntax.

We all agreed to have the shorter version. This is already implemented, I will add the test @sherjeelshabih.

@sherjeelshabih test for both the new (shorter one) and old elaborated one added.

mkuehbach commented 4 months ago

My idea was to only default to the longer dimensions in the nyaml files if there is anything user than rank and in the nxdl

This is exactly why I proposed to use the short-hand dim! dim: (as it is now implemented) should be the default. It is at least as of now also the most frequent case in base classes and appdefs where we are more explicit about dimensions than stating it can be anything

RubelMozumder commented 4 months ago

My idea was to only default to the longer dimensions in the nyaml files if there is anything user than rank and in the nxdl

This is exactly why I proposed to use the short-hand dim! Dim: should be the default. It is at least as of now also the most frequent case in base classes and appdefs where we are more explicit about dimensions than stating it can be anything

@mkuehbach, We are on the same phase. nyaml2nxdl conversion in following ways

# This convention is also valid from yaml2nxdl conversion
dimensions:
   rank: 3
   dim: [[1,2], [2,4], [3,5]]
   dim_parameters:
      doc: ["index_1","index_2", ,"index_3"]
# default from nxdl2yaml conversion
dimensions:
   rank: 3
   dim: (2, 4, 5)
   dim_parameters:
      doc: ["index_1","index_2", ,"index_3"]

If no further information on the dimensions and dim other than value, then it takes shorter notation

dim: (2,4,5)

note that the following old conversion is still valid from yaml2nxdl

# This convention is also valid on yaml2nxdl conversion dimensions: rank: 3 dim: [[1,2], [2,4], [3,5]] dim_parameters: doc: ["index_1","index_2", ,"index_3"]

RubelMozumder commented 4 months ago

Looks good now and works as intended.

I still think it would be nice to also use the shorthand dim notation even with the other attributes, but maybe that can happen later.

If we have smarter syntax then why not.