Goddard-Fortran-Ecosystem / yaFyaml

Yet Another Fortran YAML
Apache License 2.0
13 stars 8 forks source link

Possible bug or oversight with list values and indentation #41

Open bena-nasa opened 2 years ago

bena-nasa commented 2 years ago

I've been working on a script to generate a yaml input file for a project from the old version in a different format. I ended up creating a python script and creating a dictionary which I dump to a file with the Python yaml modules "dump" function to take a dictionary and emits it as a yaml text. In my first attempt I generated a Yaml file that looked like this:

Version: 3

Enabled:
- geosgcm_prog

Grids:
  PC96x49-DC:
     grid_type: LatLon
     im_world: 360
     jm_world: 180
     lm: 72
     dateline: DE
     pole: PE

Groups:
  geosgcm_prog_group:
    fields:
      AGCM:
        PHIS: {}

Collections:
  geosgcm_prog:
    frequency: '060000'
    groups:
    - geosgcm_prog_group
    output_grid: PC96x49-DC
    template: '%y4%m2%d2_%h2%n2z.nc4'

When this was being ingested by Yafyaml however it crashed and spit out this error:

illegal token encountered C

I stared at the file for quite a while and could not see what was wrong. Finally after much starting and comparing to a file a I knew that worked the only thing I would see is that the yaml file above has keys who have values that are lists that are not indented (the Enable: key for example, same with groups under collections).

So I indented the lists like so

Version: 3

Enabled:
  - geosgcm_prog

Grids:
  PC96x49-DC:
     grid_type: LatLon
     im_world: 360
     jm_world: 180
     lm: 72
     dateline: DE
     pole: PE

Groups:
  geosgcm_prog_group:
    fields:
      AGCM:
        PHIS: {}

Collections:
  geosgcm_prog:
    frequency: '060000'
    groups:
      - geosgcm_prog_group
    output_grid: PC96x49-DC
    template: '%y4%m2%d2_%h2%n2z.nc4'

and Yafyaml was happy. At least as far as python's yaml implementation was concerned it must think that is valid yaml as that is what it produced via the "dump" function of the yaml library. So I'm guessing (still looking) that the first is valid Yaml that is not support by Yafyaml?

bena-nasa commented 2 years ago

Looks like a dash counts as indentation. I'll try my hand at patching this in yafyaml. https://stackoverflow.com/questions/17014460/yaml-indentation-for-array-in-hash#:~:text=The%20dash%20in%20a%20sequence,without%20needing%20spaces%20as%20indentation.

tclune commented 2 years ago

Yup - needs a unit test as well. The parsing bit is ugly code conversion from Python, so probably far from obvious just what needs to change. We may have to live with this and/or write a converter script to add spaces before leading hyphens in your scenario.

bena-nasa commented 2 years ago

Yeah, I tried to fix it for a little while but just ended up breaking other things and gave up. For now I can create a script to add spaces. I also tried with futility to see if the python yaml emitter has any sort of formatting options to do this.

tclune commented 2 years ago

By comparing with the python source for managing indentation, I did spot that the problem may be with this function: https://github.com/Goddard-Fortran-Ecosystem/yaFyaml/blob/6b9a50eb7000a600c9816c63abc8c84075771a33/src/Lexer.F90#L568-L572

You could try playing with the 1 and 0 to see if they produce the desired behavior. Might break some unit tests, but it is just as possible the unit tests are wrong, so just see what you get.

tclune commented 2 years ago

@bena-nasa Did you ever attempt the change suggested above? I'm about to do some other work in this layer, so would be a good time to get the fix in (if we know what the fix is)