Closed NukP closed 5 months ago
Hi @NukP, two options:
yadg-4.x
and does not work with yadg-5.x
, please make a new issue on github and attach the dataschema (and ideally also the raw files).yadg-5.0
(and DataSchema-5.0
), you need to specify an extractor. I think the below should work for the first basiccsv
section:
- parser: basiccsv
input:
folders: ["."]
suffix: "csv"
contains: "flow_for_yadg"
parameters:
timestamp:
uts:
index: 0
strip: \"
units:
Flow (nml/min): smL/min
tag: outlet
extractor:
filetype: "None"
Here, I have added the last two lines. You will need to modify all sections accordingly. I suggest you do it section by section, and once each works individually, combine them into one large dataschema.
I should note that the syntax for version yadg-5.1
that I'm working on will be a little bit simpler than it is now for yadg-5.0
. Which is also why I recommend using the dataschemas written for yadg-4.x
.
Hi @PeterKraus thanks for this. I tried using exactly the yaml file for yadg4.2 that I have been currently using and it works! (at the first glance). Upon closer inspection I found that there is an issue with the flow data. The column for the flow data in the flow_for_yadg.csv is called 'Flow (nml/min)'. I found that the resulting netcdf file register this value as 'Flow (nml_min)' and did not extract the unit despite this is being instructed in the yaml file. The temperature and pressure data which uses the same csv parser has no issue with this. The absence of the unit causes issues down the road in dgpost.recipe3.data_extract-GC during the transformation using electrochemistry.fe. I found an error from pint library regarding the unit.
I suspected that there might be something wrong with how yadg5 reads '/'. So, I changes the column name in flow_for_yadg.csv from 'Flow (nml/min)' to 'Flow (nmlmin)' then re-run yadg again. I found that this time, the resulting netcdf register this flow data as 'Flow (nmlmin)' and could read the unit correctly. I used this new netcdf file with dgpost and everything went well (I had to change the column name in dgpost to Flow (nmlmin)'. No error observed.
I could have changes my pre-process file to change the column head from 'Flow (nml/min)' to 'Flow (nmlmin)' but this could be a problem down the road if we later have to use column name which include '/'. So, it would be best if you have a look into this.
The /
substitution to _
in yadg-5
is as designed, as it's not possible to store the /
character in the "column names" when using datatree.Datatree
(the NetCDF format and/or HDF5+ doesn't support it either, I think, as it's used to separate "groups"). Still, the units should get processed appropriately, before the substitution occurs, so I opened an issue above.
However, a more reasonable alternative would be having your CSV file structured with the first line being the column names and the second one units, i.e.:
Timestamp, Flow, Temperature
, nml/min, degC
Finally, a third option would be to "rename" the Flow (nml/min)
column to just Flow
, but you have to also amend your units
section in the dataschema.
@NukP: Anything to do here, or can we close this?
@PeterKraus Yes we can close it here. I decided to rename the column. The script now run fine.
Hi @PeterKraus As promised, I am now retrying to upgrade the default yadg version for the catalysis lab from yadg4.2 to yadg 5.0.
I first created new conda env and install the latest version of yadg using pip. The current version of the installed library is as follows:
yadg 5.0.2
pydantic 2.6.4
dgbowl-schemas 116
dgpost 2.1.1
I first try runing yadg /dgpost using the example files and command provided by you last summer. yadg-5.0a5-pipeline.zip. The script works fine. The netcdf was created and dgpost work correctly.
Nevertheless, we nolonger use drycal's software to measure the flow to ease the issue with piston stucking during the measurement. I wrote a script to control the flow meter. We are now using the script exclusively during the measurement. The introduction of multiplex system (running 8 cells at the same time) also requires some pre-processing of the flow data/ pressure data and temperature data before processing using yadg/dgpost.
I have made the script that will pre-process the flow data, pressure data and temperature data before processing using yadg/dgpost. The pre-processed files will be called 'flow_for_yadg.csv', 'pressure_for_yadg.csv' and 'temperature_for_yadg.csv' respectively. You can find the data after pre-processing step here I have tried to modify yadg yaml recipe file for these files yadg.preset.francesco_v5-EDLC_mod1.yaml.zip but after I tried to run yadg on this pre-process data using this modified yaml, I got the error below: It seems that extractor (which is a new feature introduced in yadg5) is required, but I am not quite sure how this work. I think the issue might stem from how I made the yaml file. Could you please have a look into this. Thank you.