HERMES-SOC / hermes_core

A central Python Package for common functionality across all hermes instruments
Other
2 stars 6 forks source link

Create Demo Jupyter Notebook for CDFWriter Examples #48

Closed Alrobbertz closed 1 year ago

Alrobbertz commented 1 year ago

I also an't upload a notebook directly to the issue so I created a GIST here: CDFWriter_Demo

ehsteve commented 1 year ago

The following list is my stream of conscientious comments as I looked through the notebook. I think it looks good! (regardless of how negative my comments may appear!)

See my comment on issue #47 for the attribute names we should use.

ehsteve commented 1 year ago

We definitely need a nicer __str__ function but what you have now is fine.

ehsteve commented 1 year ago

Let's not allow for an empty version to be created. Let's require at least one time and data column. I'm on the fence on whether we require the global and var metadata on creation as well as that is a pain.

ehsteve commented 1 year ago

Make global_attr_schema a private attribute. There is no reason a user should need to see that. Same for writer.variable_attr_schema.

ehsteve commented 1 year ago

With time required on creation, if we require a particular time scale in Time object array, could we not generate all of the time meta automatically? That would make it much simpler.

ehsteve commented 1 year ago

The variable data should be given as a Quantity array which would enable us to fill out the unit attributes in the metadata.

ehsteve commented 1 year ago

How is the writer.add_attributes_from_dict different from the dictionary update()?

ehsteve commented 1 year ago

I'm still not a fan of the two functions to_cdf and save_cdf. Again, most users will just want to write to a file with one command.

Alrobbertz commented 1 year ago

Let's not allow for an empty version to be created. Let's require at least one time and data column. I'm on the fence on whether we require the global and var metadata on creation as well as that is a pain.

Are you suggesting that we don't allow users to instantiate a CDFWriter without passing in a TimeSeries object that has at least two columns for time and a data column?

Alrobbertz commented 1 year ago

How is the writer.add_attributes_from_dict different from the dictionary update()?

This function does a debug log when you override an existing attribute, otherwise there is really no difference. Do you think it would be easier for users or users would prefer to call writer.data.meta.update(dict) over writer.add_attributes_from_dict(dict)? Do we want to force users to do all their work through the data.meta attribute?

Alrobbertz commented 1 year ago

We definitely need a nicer __str__ function but what you have now is fine.

What would you ideally like output in the __str__ function? I updated it now to have:

CDFWriter() Object:
Global Attrs:
    Discipline: Space Physics>Magnetospheric Science
    Mission_group: HERMES
    Project: STP>Solar-Terrestrial Physics
    Source_name: HERMES>Heliophysics Environmental an Radiation Measurement Experiment Suite
Variable Data:
        time       
-------------------
1970-01-01 00:00:00
1970-01-01 00:00:01
1970-01-01 00:00:02
1970-01-01 00:00:03
1970-01-01 00:00:04
1970-01-01 00:00:05
1970-01-01 00:00:06
1970-01-01 00:00:07
1970-01-01 00:00:08
1970-01-01 00:00:09
Variable Attributes:
    Var: time
        CATDESC: TT2000 time tags
        DEPEND_0: Epoch
        FORMAT: I16
        FIELDNAM: Epoch
        FILLVAL: 9999-12-31 23:59:59.999999
        VAR_TYPE: support_data
        TIME_BASE: J2000
        UNITS: UT
        RESOLUTION: 1s
        TIME_SCALE: Terrestrial Time (TT)
        REFERENCE_POSITION: rotating Earth geoid
        SI_CONVERSION: 1e-9>s
        VALIDMIN: 1970-01-01 00:00:00
        VALIDMAX: 1970-01-01 00:00:50
ehsteve commented 1 year ago

Let's not allow for an empty version to be created. Let's require at least one time and data column. I'm on the fence on whether we require the global and var metadata on creation as well as that is a pain.

Are you suggesting that we don't allow users to instantiate a CDFWriter without passing in a TimeSeries object that has at least two columns for time and a data column?

yes, i am.

ehsteve commented 1 year ago

How is the writer.add_attributes_from_dict different from the dictionary update()?

This function does a debug log when you override an existing attribute, otherwise there is really no difference. Do you think it would be easier for users or users would prefer to call writer.data.meta.update(dict) over writer.add_attributes_from_dict(dict)? Do we want to force users to do all their work through the data.meta attribute?

As we discussed, maybe it makes sense to have a meta object which inherits from dict. You could then override update to do logging as well. In practice, I don't think we will often have to overwrite existing attributes. The pythonic approach is usually to just create a new object if you want to have different contents, not edit one.

ehsteve commented 1 year ago

I think that str is fine for now. A good approach might be to summarize the global meta, provide the data column names, the time range of the data, and maybe some variable metadata. I haven't thought it through more than that.

Alrobbertz commented 1 year ago

@ehsteve I updated the CDFWriter PR #45 with metadata derivations for:

and time metadata:

and I think I addressed all your comments here.

ehsteve commented 1 year ago

I think it's time to add a user guide to the sphinx docs. That would help everyone see how this class should work and perhaps generate discussion and feedback.