aodn / content

Tracks AODN Portal content and configuration issues
0 stars 0 forks source link

author global attributes in NetCDFs #461

Closed lbesnard closed 2 years ago

lbesnard commented 4 years ago

Thomas Schroeder, sub-facility leader of the Bio Optical database (BODBAW)sub-facility raised some concerns about the various global attributes in the NetCDF files we are creating for him.

Currently, we only have two global attributes referencing persons:

see definitions in imos netcdf conventions

In the case of the BODBAW data, the PI, referenced in our NetCDFs is the person responsible for the voyage, as stipulated in the IMOS convention. This means Thomas' name does not appear within the NetCDF file. Is this something we should be concerned about or not?

Thomas proposed a series of new attributes:

“I fully agree the AODN "Author" definition is ill-defined. I think we could overcome the current confusion by adapting a more detailed and precise meta data description such as the International DataCite Metadata Scheme

http://schema.datacite.org/meta/kernel-4.3/ http://schema.datacite.org/meta/kernel-4.3/doc/DataCite-MetadataKernel_v4.3.pdf

It allows the definition of different roles under "Contributor" - see examples below that would apply to us and details in the Appendix 1 page 21-24.

"DataCurrator" would be you and/or Lesley, "DataManager" would be Laurent for example and "Distributor" would be the AODN, while my role would be "Project Manager". There is probably no need to have an additional "DataCollector" field. The researchers that produced the data would be acknowledged under "Creator", while we still could have a voyage PI listed.

PI:Voyage Principal Investigator
Creator: The main researchers involved in producing the data, or the authors of the publication, in priority order.
Contributor:
    DataCollector: Person/institution responsible for finding, gathering/collecting data under the guidelines of the author(s) or Principal Investigator (PI)
    DataCurator: Person tasked with reviewing, enhancing, cleaning, or standardizing metadata and the associated data submitted for storage, use, and maintenance within a data center or repository 
    DataManager: Person (or organization with a staff of data managers, such as a data centre) responsible for maintaining the finished resource.
    Distributor: Institution tasked with responsibility to generate/disseminate copies of the resource in either electronic or print form.
    Project Manager: Person officially designated as manager of a project. Project may consist of one or many project teams and sub‐teams. 

What do people think? @aodn/data-ops @ggalibert

mhidas commented 4 years ago

I agree that the two fields defined in our current (IMOS-1.4) conventions are not very clearly defined, and not necessarily sufficient for personally acknowledging everyone who contributed to the content of the file.

By the way, our convention defines author as "Name of the person responsible for the creation of the dataset." To me that's not even clear whether that means the person who deployed/retrieved the instrument, processed/QC'd the data, or just wrote the netCDF file that we published. These could all be different people.

This could be certainly be improved on in the next version of our conventions (whenever that might happen). In the meantime, nothing stops people from including additional global attributes to clarify who did what. However, it would be helpful to try and do this in a somewhat standard way.

To me, the obvious conventions to follow would be the Attribute Convention for Data Discovery (ACDD, v1.3), which suggests global attributes

For the moorings products, we used these to aggregate all the (unique) author and principal_investigator values from the input files, separating them with semi-colons. We also added contributor_email. So we have for something like:

contributor_name: Jane Author; Tom PI
contributor_role: author; principal_investigator
contributor_email: jane.author@email; tom.pi@email

Would that work for the BODBAW data? Rather than creating a new bunch of role-specific attributes, roles like "DataCurrator" and "ProjectManager" would just be used as values in the standard contributor_role attribute.

mhidas commented 4 years ago

By the way, ACDD 1.3 also lists recommended attributes

Not sure why we don't use these - they are pretty similar to our author and data_centre. Possibly we could specify creator as the person who measured/processed/QC'd the data, and author as the person who wrote (the code that wrote) the NetCDF file? Something to think about for the next version of the IMOS conventions...

ggalibert commented 4 years ago

I agree too that the current global attributes for doing this in the IMOS Conventions are not great and I don't mind adding new ones (ideally it would be easier if we could still keep author and data_centre). Like Marty I would rather stick to ACDD which has already influenced IMOS Conventions and is used in other marine data centres like NODC (see https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/).

lbesnard commented 3 years ago

Thanks. I find the acdd attributes can be hard to read as soon as there is more than 2-3 names to add with various roles. I will create the following attributes as requested:

For the moorings products, we used these to aggregate all the (unique) author and principal_investigator values from the input files, separating them with semi-colons. We also added contributor_email. So we have for something like:

contributor_name: Jane Author; Tom PI
contributor_role: author; principal_investigator
contributor_email: jane.author@email; tom.pi@email

What is then the recommendation on the values to be used within contributor_role. If we use them there is a need for a proper definition, otherwise I believe this could upset data providers. The ACDD convention seems to be lacking precision on the matter: https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3#contributor_role

ggalibert commented 3 years ago

In ACDD the contributor_*** attributes are Suggested while the creator_*** and publisher_*** are Recommended. The later are actually the ones used by NODC, see https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/timeSeriesOrthogonal.cdl so I think we should use these ones in priority.

Creator would be the person responsible for creating the data and Publisher could be us.

If we need to mention anyone else then we can use contributor_*** attributes, but would that be the case?

Looks like the different roles for contributors are the following: 'author', 'principalInvestigator' or 'originator' (source https://wiki.esipfed.org/Data_Discovery_(ACDD)).

'originator' is very much what creator_*** is while 'author' would be the person who produced the dataset. Then there is 'principalInvestigator' that we could use if we also needed to ackowledge this person. Anyway, maybe you can check with Thomas to see if he would be happy to use any of these options from ACDD?

ggalibert commented 3 years ago

@lbesnard please summarise what has been agreed with Thomas and close this since I think it's been resolved.