NCEAS / arcticdatautils

Utility functions in R for processing data for the Arctic Data Center
https://nceas.github.io/arcticdatautils/
Apache License 2.0
10 stars 20 forks source link

Update package for EML 2.2.0 #141

Closed amoeba closed 4 years ago

amoeba commented 5 years ago

Hey @jeanetteclark, @dmullen17: There are a couple of places where arcticdatautils is hard-coded to EML 2.1.1. Now that EML 2.2.0 is out in the wild, I think it's safe to up arcticdatautils act like EML 2.2.0 is the latest version.

Most importantly, publish_update assumes a format_id of EML 2.1.1 and can't be overridden:

https://github.com/NCEAS/arcticdatautils/blob/854eccb4e4f093dffa5cbb0684e741d11bae01a6/R/editing.R#L461-L463

which makes me think we'll see problems soon since we have a mix of EML 2.1.1 and EML 2.2.0 records on arcticdata.io: We don't wanna try to create/update an EML 2.2.0 XML record as 2.1.1 object (which would happen in the above case). On a related note, EML 2.2.0 is backwards compatible with EML 2.1.1 which means we can take any EML 2.1.1 doc, change the schema to 2.2.0 and format ID it will also validate.

I think this is pretty straightfoward but I could use feedback on one thing: During publish_update, do we wanna automigrate 2.1.1 docs to 2.2.0 (change namespace in XML record, change formatID) or do we wanna autodetect the format and let EML 2.2.0 records trickle in via other mechanisms?

Sub-tasks:

Sound good?

jeanetteclark commented 5 years ago

Hey Bryce, this sounds good to me. I think we should automigrate, but we need to wait until support is fully implemented in metacatUI otherwise users won't be able to update their datasets. We mentioned this on the call last week but you may have dropped off already.

amoeba commented 5 years ago

Ah, right. So it sounds like arcticdatautils should generally stick to 2.1.1 unless the user is already working with an EML 2.2.0 record. For example, if I run publish_update on a package with an EML 2.2.0 record, it should stay EML 2.2.0. If I run publish_update on an package with EML 2.1.1, it should also stay EML 2.1.1.

jeanetteclark commented 5 years ago

I think once EML 2.2.0 is fully integrated, then we can start auto-migrating

jeanetteclark commented 5 years ago

started a new branch for this https://github.com/NCEAS/arcticdatautils/tree/eml2_support

none of this is going to helpful until the namespace issue in ropensci/emld is resolved, however

amoeba commented 5 years ago

Alright, tweak to make publish_update let us work between the EML 2.1.1 and EML 2.2.0 formats done in 1208216e20a58364d031d3920a04a97c6cce7068. This can be merged when you think is appropriate @jeanetteclark and the rest can come once we're ready to fully migrate to EML 2.2.0 as the default.

jeanetteclark commented 5 years ago

Hey @amoeba - I made some minor changes in 0bf5fcc. I'd prefer to have just one helper function for the formatId, which takes an argument, as opposed to two functions. Right now there is no default for the version argument, but once we are fully integrated we can set it to 2.2

amoeba commented 5 years ago

That sounds like the best way. Good catch.

jeanetteclark commented 4 years ago

@kristenpeach I wonder if you would be interested in tackling the last two items on this list so that instead of creating EML 2.1.1 documents they create EML 2.2.0 documents by default.

I'm not 100% sure what will be involved but I imagine it will be a simple fix

jeanetteclark commented 4 years ago

Closed in PR #177