BLE-LTER / MetaEgress

R package to create Ecological Metadata Language documents from an instance of LTER-core-metabase database schema
https://BLE-LTER.github.io/MetaEgress/
6 stars 3 forks source link

need option of textFormat #64

Open atn38 opened 3 years ago

atn38 commented 3 years ago

@gremau

On Fri, Apr 2, 2021, 10:58 Gregory E. Maurer gmaurer@nmsu.edu wrote: Hi An,

I'm not sure if this is a metabase or a MetaEgress issue - perhaps its a limitation of MetaEgress. The issue I was having occurred when I used MetaEgress to make EML for a data package that includes an otherEntity that is a text file. When I did this with MetaEgress, the EML wouldn't validate, giving this error:

[1] FALSE attr(,"errors") [1] "Element 'dataFormat': Missing child element(s). Expected is one of ( textFormat, externallyDefinedFormat, binaryRasterFormat )."

I was able to work around this by putting 'textFormat', or any other string, into the output of create_entity_all (tables_pkg), like this:

tables_pkg$other_entities[[1]]$physical$dataFormat$externallyDefinedFormat$formatName <- 'textFormat'

Or by adding 'textFormat' to JRN Metabase in EMLFileTypes.externallyDefinedFormat_formatName.

I guess the root of the problem was that when you have an otherEntity that has a dataFormat of "externallyDefinedFormat" EML validation expects a value for "formatName". However, otherEntity I am adding is just a free text file, and I have described this type of entity in the EMLFileType table without any externallyDefinedFormat_formatName because I didn't think it really was an externallyDefinedFormat. MetaEgress seems to classify all otherEntities as externallyDefinedFormat instead of other options of textFormat or binaryRaster (if I'm reading your code right, and reading the EML schema right). So I needed to manually assign EMLFileTypes.externallyDefinedFormat_formatName to my custom textFormat EML FileType in my metabase, or insert that value into the list manually before making EML.

Not sure if this will makes sense without looking over things - happy to meet with you, and if you think this means there is an enhancement needed in MetaEgress I could help with that. I could also be misreading code and the EML schema a bit - let me know if you think that is the case. Its kind of tricky to sort out the mapping between the EML Schema, a metabase, and the EML datatypes in MetaEgress.

Greg

gremau commented 3 years ago

This issue points out that MetaEgress is not using the complete set of available dataFormats that are specified for otherEntities in the EML schema. See EML schema for otherEntity schema elements here and physical/dataFormat elements here. All otherEntities in EML created by MetaEgress are automatically given "externallyDefinedFormat" physical types, and the formatName within these types come from the metabase value at "EMLFileTypes"."externallyDefinedFormat_formatName".

Having MetaEgress allow placement of additional dataFormats (like textFormat & binaryRaster) into otherEntity/physical/dataFormat is, I guess, the most complete and accurate way to represent the EML schema. But, the reality for most use cases, and for other EML-making tools like EMLassemblyline, is that otherEntities default to being an externallyDefinedFormat. I think that is perfectly acceptable, and a nice workaround is to place the MIME type for the otherEntity file into metabase at "EMLFileTypes"."externallyDefinedFormat_formatName". That way, otherEntities that are text or other file types can be given a formatName that is appropriate and recognizeable (and is actually externally defined).

So... this might be a worthwhile improvement to MetaEgress but I would rank the issue as low priority since otherEntities can be described with a sensible formatName once you know where to put the value in metabase.