Open mdietze opened 9 years ago
related to ropensci/traits#16
@mdietze should I start working with [(https://github.com/ropensci/EML)] to read EML for meta2format.EML? It's not on CRAN yet but according to their milestone they are 95% of the way to submitting and only open issue seems relatively minor.
A few notes:
Happy to help?
traits
Where are docs for that?@sckott thanks for your quick reply! Here are thoughts:
updating the BetyDB interface in traits ...
we did not break the api that the traits package uses, so updating the package definitely falls under 'nice to have' and on the list.
Although the updated API endpoint is under api/beta, I am fairly confident that the 'get' endpoints are stable (@gsrohde does that sound reasonable - that the responses from the api/beta/tablename/id endpoints are as stable as the underlying database schema?).
Where are docs for that?
Docs for the new API are incomplete but available here: https://www.betydb.org/api/docs.
How do you want feedback and on what specifically?
Here, I mostly wanted to make sure that as @jam2767 develops an interface that translate between BETYdb and EML, it is not redundant with the implementation of the [API response to EML] that was suggested in the referenced ropensci/traits#16 issue.
Some options for implementing an EML outputter would be:
To me, 4 would be a great option be the preferred option.
Maybe a quick call to discuss would help ... or could we do this during one of our weekly PEcAn teleconferences?
Maybe we should split this into two issues? @jam2767 is primarily interested in writing meta2format (i.e. a function to insert file format metadata into the formats table of the database) and @dlebauer seems primarily interested exporting EML via format2meta.
@mdietze I was referring to using the API for both importing and exporting data that is provided with EML meta-data.
We don't have a put/post endpoint for formats yet, but using an API was proposed to avoid security issues we have had using SQL connectors (#395). The new API has 'put/post' endpoints for traits and other tables, and we could add one for formats. In general, I suspect that using similar solutions for the import and export of EML <--> BETYdb would make implementation and maintenance easier.
@mdietze I've been playing with EML package and can extract variable names and other info from example EML files from dataone, but have questions about what is actually going into meta2format. Is some other function, or BD DTS, handing meta2format the EML file? For inputting a new formats record, all that is required is an id, mime type, name, header, skip and notes, correct? So is meta2format just getting that info and passing it off to another function to be inserted into the DB, since you said last week that meta2format is separate from the DB insert?
What I said was that there are two steps: 1) extracting the required metadata from the file and 2) inserting that data into the database. I also said that BD only has to do the first, and the second happens purely within PEcAn (i.e. BD doesn't insert anything into the database). That said, I'd envisioned that meta2format.EML would contain both these steps -- it would contain a call to Brown Dog and a database insert. It doesn't matter what the BD extraction module is named.
Next, you're missing a whole lot in terms of what goes into a new format. Yes, what you listed is the FORMATS table row itself, but this function also has to handle all the formats_variables rows as well.
Addendum: it's probably worthwhile to create a generic function for inserting database formats and formats-variables as you'll need it in every meta2format function and for manually inserting records.
Here is a sample data package dlebauer.3.2.zip that I generated from Morpho a few years ago. It includes:
the eml file (easiest to view by opening metadata.html) defines the fields in the csv. The fields in the csv map closely to the traits_and_yields_view table that is queried by the search end point.
This issue is stale because it has been open 365 days with no activity.
This issue proposes the creation of a set of functions for creating formats and formats_variables records from standard metadata formats, such as EML (e.g. meta2format.EML), and a second set of functions which does the opposite, exporting metadata records out of PEcAn into standard formats.
The arguments to the first would be the input file and new Formats.name
The arguments to the second would be Format.ID and the output file
Not only would these functions be handy unto themselves for making it easier to import new datasets that HAVE metadata, but the hope is that they could be paired with the Brown Dog DTS to insert the metadata it infers into the database as well.