Issue with yadg yaml recipe file

NukP commented 7 months ago

Hi @PeterKraus As promised, I am now retrying to upgrade the default yadg version for the catalysis lab from yadg4.2 to yadg 5.0.

I first created new conda env and install the latest version of yadg using pip. The current version of the installed library is as follows:
yadg 5.0.2
pydantic 2.6.4
dgbowl-schemas 116
dgpost 2.1.1

I first try runing yadg /dgpost using the example files and command provided by you last summer. yadg-5.0a5-pipeline.zip. The script works fine. The netcdf was created and dgpost work correctly.

Nevertheless, we nolonger use drycal's software to measure the flow to ease the issue with piston stucking during the measurement. I wrote a script to control the flow meter. We are now using the script exclusively during the measurement. The introduction of multiplex system (running 8 cells at the same time) also requires some pre-processing of the flow data/ pressure data and temperature data before processing using yadg/dgpost.

I have made the script that will pre-process the flow data, pressure data and temperature data before processing using yadg/dgpost. The pre-processed files will be called 'flow_for_yadg.csv', 'pressure_for_yadg.csv' and 'temperature_for_yadg.csv' respectively. You can find the data after pre-processing step here I have tried to modify yadg yaml recipe file for these files yadg.preset.francesco_v5-EDLC_mod1.yaml.zip but after I tried to run yadg on this pre-process data using this modified yaml, I got the error below: It seems that extractor (which is a new feature introduced in yadg5) is required, but I am not quite sure how this work. I think the issue might stem from how I made the yaml file. Could you please have a look into this. Thank you.


  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\Scripts\yadg.exe\__main__.py", line 7, in <module>
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\main.py", line 201, in run_with_arguments
    args.func(args)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\subcommands.py", line 144, in preset
    schema = to_dataschema(**preset)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\dgbowl_schemas\yadg\__init__.py", line 40, in to_dataschema
    schema = Model(**kwargs)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\pydantic\main.py", line 171, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 56 validation errors for DataSchema
steps.10.Dummy.parser
  Input should be 'dummy' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.BasicCSV.parser
  Input should be 'basiccsv' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.BasicCSV.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.MeasCSV.parser
  Input should be 'meascsv' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.MeasCSV.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.FlowData.parser
  Input should be 'flowdata' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.FlowData.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.FlowData.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.ElectroChem.parser
  Input should be 'electrochem' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.ElectroChem.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.ElectroChem.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.ChromTrace.parser
  Input should be 'chromtrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.ChromTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.ChromTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.ChromData.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.ChromData.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.MassTrace.parser
  Input should be 'masstrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.MassTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.MassTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.QFTrace.parser
  Input should be 'qftrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.QFTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.QFTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.XPSTrace.parser
  Input should be 'xpstrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.XPSTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.XPSTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.XRDTrace.parser
  Input should be 'xrdtrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.XRDTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.XRDTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.Dummy.parser
  Input should be 'dummy' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.BasicCSV.parser
  Input should be 'basiccsv' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.BasicCSV.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.MeasCSV.parser
  Input should be 'meascsv' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.MeasCSV.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.FlowData.parser
  Input should be 'flowdata' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.FlowData.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.FlowData.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.ElectroChem.parser
  Input should be 'electrochem' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.ElectroChem.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.ElectroChem.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.ChromTrace.parser
  Input should be 'chromtrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.ChromTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.ChromTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.ChromData.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.ChromData.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.MassTrace.parser
  Input should be 'masstrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.MassTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.MassTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.QFTrace.parser
  Input should be 'qftrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.QFTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.QFTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.XPSTrace.parser
  Input should be 'xpstrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.XPSTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.XPSTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.XRDTrace.parser
  Input should be 'xrdtrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.XRDTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.XRDTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden```

PeterKraus commented 7 months ago

Hi @NukP, two options:

You should be able to use the dataschema (i.e. the yaml file) that you used for yadg version 4.x with yadg version 5.x, as that should get upgraded by yadg automatically. If you have a dataschema file that works with yadg-4.x and does not work with yadg-5.x, please make a new issue on github and attach the dataschema (and ideally also the raw files).
If you want to write a new dataschema, as of yadg-5.0 (and DataSchema-5.0), you need to specify an extractor. I think the below should work for the first basiccsv section:
```
- parser: basiccsv
input:
    folders: ["."]
    suffix: "csv"
    contains: "flow_for_yadg"
parameters:
    timestamp:
        uts:
            index: 0
    strip: \"
    units:
        Flow (nml/min): smL/min
tag: outlet
extractor:
    filetype: "None"
```
Here, I have added the last two lines. You will need to modify all sections accordingly. I suggest you do it section by section, and once each works individually, combine them into one large dataschema.

I should note that the syntax for version yadg-5.1 that I'm working on will be a little bit simpler than it is now for yadg-5.0. Which is also why I recommend using the dataschemas written for yadg-4.x.

NukP commented 7 months ago

Hi @PeterKraus thanks for this. I tried using exactly the yaml file for yadg4.2 that I have been currently using and it works! (at the first glance). Upon closer inspection I found that there is an issue with the flow data. The column for the flow data in the flow_for_yadg.csv is called 'Flow (nml/min)'. I found that the resulting netcdf file register this value as 'Flow (nml_min)' and did not extract the unit despite this is being instructed in the yaml file. The temperature and pressure data which uses the same csv parser has no issue with this. The absence of the unit causes issues down the road in dgpost.recipe3.data_extract-GC during the transformation using electrochemistry.fe. I found an error from pint library regarding the unit.

I suspected that there might be something wrong with how yadg5 reads '/'. So, I changes the column name in flow_for_yadg.csv from 'Flow (nml/min)' to 'Flow (nmlmin)' then re-run yadg again. I found that this time, the resulting netcdf register this flow data as 'Flow (nmlmin)' and could read the unit correctly. I used this new netcdf file with dgpost and everything went well (I had to change the column name in dgpost to Flow (nmlmin)'. No error observed.

I could have changes my pre-process file to change the column head from 'Flow (nml/min)' to 'Flow (nmlmin)' but this could be a problem down the road if we later have to use column name which include '/'. So, it would be best if you have a look into this.

PeterKraus commented 7 months ago

The / substitution to _ in yadg-5 is as designed, as it's not possible to store the / character in the "column names" when using datatree.Datatree (the NetCDF format and/or HDF5+ doesn't support it either, I think, as it's used to separate "groups"). Still, the units should get processed appropriately, before the substitution occurs, so I opened an issue above.

However, a more reasonable alternative would be having your CSV file structured with the first line being the column names and the second one units, i.e.:

Timestamp, Flow, Temperature
, nml/min, degC

Finally, a third option would be to "rename" the Flow (nml/min) column to just Flow, but you have to also amend your units section in the dataschema.

PeterKraus commented 5 months ago

@NukP: Anything to do here, or can we close this?

NukP commented 5 months ago

@PeterKraus Yes we can close it here. I decided to rename the column. The script now run fine.

dgbowl / yadg

Issue with yadg yaml recipe file #142