Closed mobb closed 5 years ago
The naming convention proposed by @cgries makes sense to me (i.e. studyName_ecocomDPTableName). To accommodate versioning I propose we use _vNUMBER (e.g. NTL_RS_observation_summary_v15.csv for version fifteen). What do you think?
another option is to use source the packageId, as in edi_5_2_summary.csv
This is a good option as well. It seems the naming convention can be flexible but must include the ecocomDP table names. The L1 aggregator function should have little trouble identifying which of the 7 ecocomDP tables it is working with.
Reread Corinna's original comment. Her problem is different, eg, the dataset has 2 tables that each could be considered primary observations, and she is suggesting they could remain so. We need the dataset id (@cgries , please comment)
It's an early comment and I probably need to overhaul this whole dataset now: https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-ntl.338.1
Hi @mobb and @cgries. Any progress on this front? I've created /documentation/practices/naming_tables.md to convey the recommendation once it's formulated.
There are a number of recommendations out there already that we could just adopt: https://daac.ornl.gov/datamanagement/#descriptive_filenames : File names should reflect the contents of the file and uniquely identify the data file. File names may contain information such as project acronym, study title, location, investigator, year(s) of study, data type, version number, and file type.
File names should be constructed to contain only lower-case letters, numbers, and underscores – no spaces or special characters – for easy management by various data systems and to decrease software and platform dependency. Similar logic is useful when designing directory structures and names.
The implemented practice is a file name:
ecocomDP
table name (e.g. "observation").Example data package in the EDI Data Repository.
The above practice doesn't guarantee globally unique names, but globally unique names are not needed until the reuse/aggregation step, which is taken care of by the aggregate_ecocomDP()
function.
Corinna's comment:
When I use the ecocom_dp for any new incoming community datasets, I can’t call the files all the same name. I.e., I can’t just use ‘observation’, ‘event’, etc. over and over again. I am already stumbling on this one dataset because they gave me raw observations (several per lake) and then summarized for each lake. I do want to archive both as people probably want both. So, what I have done for file name now is prefix it with the study and the postfix it with raw and summary. I.e., NTL_RS_Marcrophytes_observation_raw.csv and NTL_RS_Macrophytes_observation_summary.csv. Of course, they could go into one, but I am sure that would make it very difficult to use.