ess-dive-community / essdive-file-level-metadata

READY TO USE. Reporting format for File Level Metadata uploaded to the ESS-DIVE repository
https://ess-dive.gitbook.io/file-level-metadata-reporting-format/
Creative Commons Attribution 4.0 International
1 stars 3 forks source link

Suggested addition: add list of suggested data type terms and their definitions #6

Open regnans opened 3 years ago

regnans commented 3 years ago

Submitter: Kim Ely

I suggest the following changes: on the page https://github.com/ess-dive-community/essdive-file-level-metadata/blob/master/CSV_dd/CSV_dd_quick_guide.md#data-type

It would be really useful to have some defined terms here. Terms like "text" and "string", and "numeric" and "integer" (and other number types) are often confused and (incorrectly) used interchangeably. A defined list would be really useful. Some other types to consider could be logical, percent, fraction.

regnans commented 3 years ago

Some specific questions/examples:
What is the unit for a Date field? Is it the "format" of the date, i.e. YYYY-MM-DD, or is it the smallest unit, i.e. day? Or something else? Should a numeric ratio have a unit or N/A?

robcrystalornelas commented 3 years ago

@tvelliquette Wanted to bring you attention to this. @regnans and Terri do you think it would be helpful to have these definitions within the metadata "element" called data-type or elsewhere?

regnans commented 3 years ago

I see the simplest solution would be to have this information within the "Standard definition" row of the Data type table. Or it could be added as another table immediately below. (I realize that this request is not a quick fix, but as a data contributor I would find it really useful, as I struggle with using data type terms consistently).

kristinboye commented 3 years ago

Looking through the currently uploaded (in various stages) reporting formats there is no consistency with e.g. date and time reporting format (even though each reporting format requires a specific way of reporting date and time...), ways of constructing "terminology files/data dictionary files" etc. So building on Kim's and others' previous comments, I think we need to harmonize the terminology, formatting (when appropriate), and requirements across the ESS-DIVE reporting formats to simplify for the average data producer/archiver.

regnans commented 3 years ago

An update on "data type" terms. A BNL we have settled (for now), on assigning the following data types in our data dictionaries: integer, floating point, string, date, time, date-time.

robcrystalornelas commented 3 years ago

Sounds good, Kim. Thanks for posting the BNL updates to data types here.

On Mon, Mar 8, 2021 at 10:50 AM Kim Ely notifications@github.com wrote:

An update on "data type" terms. A BNL we have settled (for now), on assigning the following data types in our data dictionaries: integer, floating point, string, date, time, date-time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ess-dive-community/essdive-file-level-metadata/issues/6#issuecomment-792988188, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTX7SNTREGXGE4IRGAG3WDTCUL7BANCNFSM4WPA4BYA .

-- Rob Crystal-Ornelas, PhD Postdoctoral Scholar Lawrence Berkeley National Lab | ESS-DIVE Pronouns: he/him/his

regnans commented 3 years ago

To clarify, my post was an FYI, not necessarily how it should be done. Very happy to take advice from data scientists here! I have a lot to learn about the meanings and implications of using different sorts of data types.

Also, with our current usage of these terms we are not using "numeric" or "text". (Although perhaps "text" is appropriate for a field that included multiple sentences of text.)

robcrystalornelas commented 3 years ago

@vchendrix As Terri and the ORNL team are finishing up the File-level metadata reporting format, Terri sent over a question about whether or not having a place for users of this format to report which data type they are providing in their data sheets would be useful.

Notes from Terri:

regnans commented 2 months ago

This issue is still relevant (see publication recommendations made for EDDOI-9773). Please include a list of standard Data_Type terms in the GitHub reporting format documentation.