Open jagoldstein opened 6 years ago
This is a feature we are planning on supporting in future versions
We spend a lot of time in the datateam making sure the physical
sections are correct. This is definitely something we emphasize heavily in our training and daily work with the team. Moving forward with this version of the editor, are we expected to add these physical sections in using R? Or would we just accept that some datasets would not have them? Having to add them would obviate some of the efficiencies we are happily anticipating by switching to curating datasets using the editor as opposed to R. I welcome feedback from @mbjones @csjx and @amoeba here.
From an operational perspective, the only reason to have physical
sections was to support the old web form (Registry script) because it didn't understand what a Resource Map was. These sections are no longer used by the Editor. @laurenwalker does the PROV editor integration work off the physical
section or just the Entity?
From a perspective of authoring best practices EML, the physical
section is a strong requirement as its the glue that makes any assertions about what data the EML is documenting hold up (URL, checksum).
From an Editor release perspective, this is not a dealbreaker because the Editor will still work fine without the physical
sections (operational) AFAIK. But I feel like we would then want to go back over Editor-2.0-created Packages and align the physical
sections with the Entities described in the Resource Maps. This is an increase in work for the Data Team if we choose to do this.
Weighing the cost of delaying the Editor's release against a possible increase in work for the Data Team, I'd go with not delaying the Editor release for this feature but making sure we have a solid plan to deal with the fallout.
Thanks for the clarification Bryce. The points you make about best practices EML are why we emphasize this section so much.
Jesse and I would definitely like to have a plan for this going forward, and part of the reason why I made that comment was to make sure that our path forward is clear. We could probably write a helper function that would automatically add in physical sections for all the objects in a data package and update the EML, and incorporate that as part of our workflow in curating packages that come in.
@amoeba - The Prov editor can work without the physical section, as long as there is an entity section in the EML.
@lauren what are the EML elements, in order of precedence, that are evaluated in order to link a package member (in the table listing) with an entity section in the metadata view, i.e. to enable the more info
link?
@gothub - This is the code that matches up the entity section with the DataONE object:
https://github.com/NCEAS/metacatui/blob/master/src/js/views/MetadataView.js#L1378-L1476
Ok, thanks. I'm thinking that it would be useful for the datateam to have a list of EML elements to include or check so that this connection between the entity section and the D1 object can be insured/controlled (otherwise prov may not display). From my first pass reading of the code, it looks like the order of precedence is
Is this accurate?
With the new editor, this shouldn't be an issue since it creates the entity sections automatically. @gothub - Does the data team regularly create EML manually? I thought they just used the old registry, new editor, primarily.
Yep, good point - maybe this is a non-issue.
I actually think that while the new editor doesn't currently produce physical
sections, it ought to and will in the future. While objectName
and authentication
are useful, those are now more universally supported in the SystemMetadata
. That said, the physical/dataFormat
tree is quite important for programmatically parsing fixed width, simple delimited, and complex delimited text files, as well as binary rasters. So, we plan on parsing delimited files like .tsv
, .csv
etc. in the editor, and providing the physical
metadata needed to then load and parse these types of files into a preview-like display. So, we'll get there, and thanks for pointing this out @jagoldstein - these are all things Morpho does internally, and we want to match that on the MetacatUI side.
Not to derail this Issue, but @csjx I take strong issue with this:
While objectName and authentication are useful, those are now more universally supported in the SystemMetadata
Authoring DataONE System Metadata and EML are orthogonal concerns, where the former is to make DataONE happy and the latter is to document the dataset. Omitting the checksum and objectName
in the EML because the System Metadata already contains them would be a mistake for long-term preservation of the scientific metadata. I wonder if we really feel different on this though.
@amoeba Ah, yes - point well taken. I agree that these things should be populated in the EML, but I guess I was just trying to make the point that the physical/dataFormat
tree is quite important, and that we shouldn't forego populating dataset/physical
in MetacatUI. In fact, if we populate it at all, objectName
is in fact required. So yeah, complete EML descriptions are key, and repeating the basic information in SystemMetadata
makes us more interoperable. Thanks for pointing this out. 😄
At the ADC team meeting today, we decided that this is a med-high priority issue. We need to create physical sections and update them each time an object is updated (for now, that would only the be file name since there is no "replace file" function in the editor yet).
This came up again during ADC discussions, since the data team still routinely adds physical entity metadata to each submission. We should be able to easily add this metadata automatically about the info we already know about files during upload and from the system metadata.
Generation of the physical sections should be configurable in the AppModel so some repos can turn that off if they choose to.
When uploading data objects with a package and submitting, no
physical
section is created in the EML. Typically, this section would provide file size, checksum, and the online distribution URL.Examples: https://test.arcticdata.io/#view/urn:uuid:d71babe1-3fe3-430c-913d-56eef00124b6 https://test.arcticdata.io/#view/urn:uuid:6de80a8a-f674-4b5d-b1ff-e9bbcea52634