cboettig / eml2

:package: A list-based rewrite of higher-level functions from EML
5 stars 3 forks source link

additionalMetadata/custom units #15

Closed scelmendorf closed 6 years ago

scelmendorf commented 6 years ago

I can't seem to make a valid eml file that has additionalMetadata. Works ok until you introduce that element (recall I got stuck on setTextType in EML so then tried to just make the same file in eml2 - possibly neither package is working start to finish and I'm just stuck using a hybrid - but thought I'd ask). Once I put in additionalMetadata for custom units I hit the error: "Error in ns_lookup(parent$doc, parent$node, parts[[1]]) : No namespace with prefix eml found"

I kludged together something just by trial and error adding @type and@context that will successfully generate a file, but this is probably not the correct thing to do. Thoughts? example below:

rm(list=ls())
> me <- list(individualName = list(givenName = "Carl", surName = "Boettiger"))
> my_eml <- list(dataset = list(
+   title = "A Mimimal Valid EML Dataset",
+   creator = me,
+   contact = me)
+ )
> 
> #oddly the validation behaves differently if you save or not first
> eml2::eml_validate(my_eml)
[1] FALSE
attr(,"errors")
[1] "Element '{eml://ecoinformatics.org/eml-2.1.1}eml': The attribute 'packageId' is required but missing." "Element '{eml://ecoinformatics.org/eml-2.1.1}eml': The attribute 'system' is required but missing."   
> eml2::write_eml(my_eml, "ex.xml")
> eml2::eml_validate("ex.xml")
[1] TRUE
attr(,"errors")
character(0)
> eml2::eml_validate(my_eml)
[1] FALSE
attr(,"errors")
[1] "Element '{eml://ecoinformatics.org/eml-2.1.1}eml': The attribute 'packageId' is required but missing." "Element '{eml://ecoinformatics.org/eml-2.1.1}eml': The attribute 'system' is required but missing."   
> 
> #just make a super simple example
> #because I wasn't sure all the custom units
> #set_unitList functionality was working
> my_eml$additionalMetadata=list(
+   metadata=list(
+     unitList=list(
+       unit=list(
+         id='number',
+         name='number',
+         unitType='dimensionless',
+         description='a number'
+       )
+     )
+   )
+ )
> 
> #now get errors when try to validate
> eml2::eml_validate(my_eml)
Error in ns_lookup(parent$doc, parent$node, parts[[1]]) : 
  No namespace with prefix `eml` found
> #now get errors when try to write
> eml2::write_eml(my_eml, "ex.xml")
Error in ns_lookup(parent$doc, parent$node, parts[[1]]) : 
  No namespace with prefix `eml` found
> 
> #copied in some things that are from a dataset 
> #generated using the EML package
> my_eml$`@type`='EML'
> my_eml$`@context`=list(
+ `@vocab`= "eml://ecoinformatics.org/eml-2.1.1/",
+ eml= "eml://ecoinformatics.org/eml-2.1.1/",
+ xsi="http://www.w3.org/2001/XMLSchema-instance/",
+ id="@id",
+ stmml="http://www.xml-cml.org/schema/stmml-1.1/")
> 
> #now it works again
> eml2::eml_validate(my_eml)
[1] FALSE
attr(,"errors")
[1] "Element '{eml://ecoinformatics.org/eml-2.1.1}eml': The attribute 'packageId' is required but missing." "Element '{eml://ecoinformatics.org/eml-2.1.1}eml': The attribute 'system' is required but missing."   
> 
> #address the errors
> my_eml$packageId='edi.12345'
> my_eml$system='edi'
> 
> #write it out now works
> eml2::write_eml(my_eml, "ex.xml")
> eml2::eml_validate("ex.xml")
[1] TRUE
attr(,"errors")
character(0)
cboettig commented 6 years ago

Thanks for the full report, I'll take a look!

ps. I put your example code inside a code fence so it's a bit more readable, see Styling with Markdown for github issues. :)

cboettig commented 6 years ago

@scelmendorf Thanks very much for the bug report. Looks like this problem was due to an issue in the dependent library emld which I've just fixed. If you do:

devtools::install_github("cboettig/emld")

and restart R, then units should work. In particular, here's a minimal example of a custom unit:

custom_units <- 
  data.frame(id = "speciesPerSquareMeter", 
             unitType = "arealDensity", 
             parentSI = "numberPerSquareMeter", 
             multiplierToSI = 1, 
             description = "number of species per square meter")

unitList <- set_unitList(custom_units)

me <- list(individualName = list(givenName = "Carl", surName = "Boettiger"))
my_eml <- list(dataset = list(
              title = "A Mimimal Valid EML Dataset",
              creator = me,
              contact = me),
              additionalMetadata = list(
                metadata = list(
                  unitList = unitList)))

write_eml(my_eml, "eml-with-units.xml")
eml_validate("eml-with-units.xml")

That whole additionalMetadata = list( metadata = list(unitList = unitList))) is pretty ugly though, we should probably figure out a cleaner syntax to help create that.

Please confirm if this does or doesn't work for you!

p.s. Thanks for pointing out the weirdness in validating without saving. This happens because write_eml automatically adds a packageId and system to your EML document using uuid if it does not already have a packageId. A uuid might not be a great choice in many cases though (i.e. maybe you want a reserved DOI here or something else more official). I add it automatically just for convenience of validation only when you write out to XML, but maybe we shouldn't do that. If you call validate directly on the list version, then there's no intermediate function that can magically add an id to your document. what do you think?

scelmendorf commented 6 years ago

Thanks - works! I personally find it nonintuitive that write_eml silently adds a packageId and system if it's not in there. Though I can see where this is useful for testing as you build up a file. I think simpler would be to add an option to eml_validate to ignore packageID (assuming most people will get those as the last step but want to have a mostly complete/functioning eml file before they set/request one), that will strip those errors from the output if requested.