Closed cchwala closed 1 year ago
Something like this should work to set the correct time encoding
ds.time.attrs['unit'] = 'seconds since 1970-01-01'
Seems like it should be set in the encoding, like this: ds.time.encoding['units'] = "seconds since 1970-01-01 00:00:00"
There is a PR for a preliminary implementation. It includes an adjustment of variable names that are present in the example datasets and which I can clearly identify (pmin
and pmax
are not yet changed).
The datasets do, however, not all fulfill the requirements. E.g. tsl
and rsl
are missing in some datasets. These only include total loss (I now used tl
as the variable name) which is not in the table of variables in the white paper. Moreover, pmin
, and pmax
are not in the list of possible variables.
The updated code also assigns attributes to the datasets. However, the dictionary for attributes of the data variables is not yet complete. I.e. part of the table of the white paper that defines variable conventions still needs to be implemented.
PR https://github.com/OpenSenseAction/OPENSENSE_sandbox/pull/37 has been merged. WG1 data format for CMLs is mostly implemented. Now, also variables pmin
and pmax
are changed to rsl_min
and rsl_max
, respectively, and the dictionary defining attributes is complete.
Still missing is the following:
cml_id
in the OpenMRG dataset (so far only sublink_id
)tsl
and rsl
in the Czech data sets (now only tl
) Thanks for the summary.
We will keep this Issue open to track what has still to be done
Still missing is the following:
* `cml_id` in the OpenMRG dataset (so far only `sublink_id`) * required variables `tsl` and `rsl` in the Czech data sets (now only `tl`)
The first point will be solved by #51.
@fenclmar: If I understand correctly the second point about missing tsl
and rsl
in the Czech CML datasets cannot be fixed since the variables are missing in the raw data. Correct? If so, we leave it like that and maybe just raise a warning when transforming the data, or maybe just silently ignore this issue since it will be apparent from the resulintg xarray.Dataset
that only tl
is there. Please comment.
@fenclmar Two other points
There was a lot of work on this in #62 and we have everything working for our most important CML dataset, the OpenMRG data.
I will rename this issue to be more specific and the close it.
We should open a new issue to discuss the next steps regarding data format, e.g. which PWS data shall be transformed as an example, how and with which code. Maybe this should be done in a separate repo then after doing #7.
Based on the final decision of the data format for CML, PWS and SML, the existing data transformation code has to be adjusted.
EDIT: The current work was very much focused on instantaneous CML data from the OpenMRG dataset, hence, this issue was renamed to have a narrower focus and finally closed.