FAIRmat-NFDI / nyaml

https://fairmat-nfdi.github.io/nyaml
https://pypi.org/project/nyaml/
Apache License 2.0
1 stars 0 forks source link

Incorrect handling of multiple xref-related docs during nxdl2nyaml #8

Closed lukaspie closed 5 months ago

lukaspie commented 5 months ago

With the current version of nyaml, there is an issue in the backwards direction (i.e. nxdl to nyaml) if there is more than one xref-related part in the docstrings of the NXDL file.

Error message:

pielsticker@pccec0853 MINGW64 ~/lukas/fairmat/code/nexus_definitions (mpes-refactor)
$ nyaml2nxdl contributed_definitions/NXenergydispersion.nxdl.xml
Traceback (most recent call last):
  File "C:\Users\pielsticker\Anaconda3\envs\nexus-defs\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\pielsticker\Anaconda3\envs\nexus-defs\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\pielsticker\Anaconda3\envs\nexus-defs\Scripts\nyaml2nxdl.exe\__main__.py", line 7, in <module>
  File "C:\Users\pielsticker\Anaconda3\envs\nexus-defs\lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\pielsticker\Anaconda3\envs\nexus-defs\lib\site-packages\click\core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "C:\Users\pielsticker\Anaconda3\envs\nexus-defs\lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\pielsticker\Anaconda3\envs\nexus-defs\lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\pielsticker\Lukas\FAIRMat\code\nyaml\nyaml\cli.py", line 141, in launch_tool
    converter.print_yml(input_file, yaml_out_file, verbose)
  File "C:\Users\pielsticker\Lukas\FAIRMat\code\nyaml\nyaml\nxdl2nyaml.py", line 184, in print_yml
    self.xmlparse(output_yml, xml_tree, depth, verbose)
  File "C:\Users\pielsticker\Lukas\FAIRMat\code\nyaml\nyaml\nxdl2nyaml.py", line 1022, in xmlparse
    self.recursion_in_xml_tree(depth, xml_tree, output_yml, verbose)
  File "C:\Users\pielsticker\Lukas\FAIRMat\code\nyaml\nyaml\nxdl2nyaml.py", line 948, in recursion_in_xml_tree
    self.xmlparse(output_yml, xml_tree_children, depth, verbose)
  File "C:\Users\pielsticker\Lukas\FAIRMat\code\nyaml\nyaml\nxdl2nyaml.py", line 1022, in xmlparse
    self.recursion_in_xml_tree(depth, xml_tree, output_yml, verbose)
  File "C:\Users\pielsticker\Lukas\FAIRMat\code\nyaml\nyaml\nxdl2nyaml.py", line 948, in recursion_in_xml_tree
    self.xmlparse(output_yml, xml_tree_children, depth, verbose)
  File "C:\Users\pielsticker\Lukas\FAIRMat\code\nyaml\nyaml\nxdl2nyaml.py", line 999, in xmlparse
    self.handle_not_root_level_doc(
  File "C:\Users\pielsticker\Lukas\FAIRMat\code\nyaml\nyaml\nxdl2nyaml.py", line 402, in handle_not_root_level_doc
    parts[i - 1] += doc
IndexError: list index out of range

Problematic input in NXenergydispersion.nxdl.xml:

<field name="energy_scan_mode" type="NX_CHAR">
    <doc>
         Way of scanning the energy axis (fixed or sweep).

             This concept is related to term `12.65`_ of the ISO 18115-1:2023 standard.

         .. _12.65: https://www.iso.org/obp/ui/en/#iso:std:iso:18115:-1:ed-3:v1:en:term:12.65

             This concept is related to term `12.66`_ of the ISO 18115-1:2023 standard.

         .. _12.66: https://www.iso.org/obp/ui/en/#iso:std:iso:18115:-1:ed-3:v1:en:term:12.66
    </doc>

The problem is in the handle_not_root_level_doc function in nyaml\nxdl2nyaml.py:

    def handle_not_root_level_doc(self, depth, text, tag="doc", file_out=None):
        """Handle docs field of group and field but not root.

        Handle docs field along the yaml file. In this function we also tried to keep
        the track of indentation. E.g. the below doc block.
            * Topic name
                Description of topic
        """
        if "}" in tag:
            tag = remove_namespace_from_tag(tag)
        indent = depth * DEPTH_SIZE
        text = self.clean_and_organise_text(text, depth)  # starts with '\n'
        docs = re.split(r"\n\s*\n", text)
        parts = []

        # Add links to previous docstring
        for i, doc in enumerate(docs):
            link_match = re.match(r"\s*\.\. _.*", doc)
            if link_match is not None:
                parts[i - 1] += doc
            else:
                parts.append(doc)

Any ideas @domna?

domna commented 5 months ago

Yes, I added this recently. I needed to adapt the parsing here, because we split the paragraphs and handle them separately. So I just appended it to the previous part but obviously there is still something wrong with this approach. I'll give it a check