Open cboettig opened 6 years ago
@cboettig
Hi! I started to use the eml2schema.jq
map on other EML files, and I ran into some compatibility issues. When I apply it directly on the document citation-sbclter-bibliography.51
it doesn't work, and nothing is shown when I knit it. After some trials I think the problem originates from the format of the creator
element. In the hf205
file the everything in the creator
element is nested in [ ], while in the citation-sbclter-bibliography.51
file there is no [ ]. And that needs to be adjusted specifically.
So as you can see from the eml2schema.jq
map, in lines 21 and 46, if I put [] after .creator, then the script works for hf205
but not citation-sbclter-bibliography.51
, if I don't put [] then the opposite happens. I am still trying to figure out how to solve this problem, and I wonder if you have any suggestions.
Besides that, I have written some more for the citation EML files, and those are included in the eml_to_schema2.jq
file that I uploaded (I didn't included other elements here). It is knitted in the .Rmd file as well using the file citation-sbclter-bibliography.50
as an example. Please take a look at your convenience and let me know what should be improved.
Thank you very much! And Happy Chinese New Year (which was yesterday)!
@cboettig
I am also trying to simplify the creator
code a little bit (not using a bunch of conditional statements), but there are still some bugs that I need to fix. I will upload that as well when I finish.
Thank you!
Sounds good, thanks for the update!
@cboettig
Greetings! I added a few more elements to the eml_to_schema.jq
file, based on the eml files that I translated before. However, there are many things in those files that I can't find a corresponding category here http://schema.org/Dataset. For instance, there is a distribution
category in many of those files, but it doesn't really match anything on the schema.org website. Many files also have attribute
or attributelist
that I also don't know what to do with. Please let me know if you have any suggestions!
Thank you very much!
@AlexLi0104 Thanks, good questions.
you should map the EML notion of distribution
, to schema.org distribution, which you will see takes an object of class DataDownload
, http://schema.org/distribution. Check out https://developers.google.com/search/docs/data-types/dataset for a detailed example. Both of these are used to describe "where to get the data".
For attribute
, this is basically a description of what each column of data in a csv file or spreadsheet contains (i.e. column 1 has "species names", column 2 has "weight of organism in grams", but using the markup.) You'll want to map these to the schema property variableMeasured
, like you see in this example here: https://github.com/boettiger-lab/eml2schema/blob/master/examples/earthcube.json#L196-L214 . (This one may be a bit tricky and require more thinking about what each field actually means).
@cboettig
Greetings! I have added DataDownload
and variableMeasured
into the script. I noticed that distribution
is often located in different parts of the file (most are in dataset
, but some are nested). I have tried to write a single code that takes care of all occurrences of distribution
, but it didn't work. So for now I have put some typical locations in to an array, and if distribution
doesn't occur there it just gives a null.
For variableMeasured
, I am not sure whether I mapped the fields correctly, and also for some reason the output only contains the first set of values. Please feel free to take a look (right now I use the eml-dataset
file in examples/eml as test file), and I will keep working on this.
I am truly sorry that I have not done much for recent weeks (there's a lot going on in all my classes). I will try to do more and wrap up before the finals, and please let me know if there are some other things to do!
Thank you very much!
@cboettig
Greetings! I made some additional changes to dataDownload
and variablesMeasured
during the weekend, and it should be a bit more comprehensive than last week. I also included a delpaths
for some categories so that the nulls
don't show up when knitted. Please feel free to take a look!
Thank you!
Nice! Going through these today!
@cboettig
Greetings! I am going back to China today and may not may not have access to my email and Github (depending on whether Berkeley VPN still works). All the latest changes have been uploaded. Please feel free to take a look! I also completed the URAP end of semester evaluation yesterday.
Thank you so much for a wonderful semester. This is my first research experience and it has been exciting and rewarding. I have learned a lot, and I hope that I could continue working for you in the next semester if possible!
Have a wonderful summer!
@AlexLi0104 Thanks, great work, I think we got a great start and my team will enjoy testing this out this summer. We'll have some interesting new directions for the Fall semester building on this so it would be great to continue working with you.
Safe travels and all the best,
Carl
Test out the
eml2schema.jq
mapping you've written on each of the EML files in our example set. You'll want to continue to modify the mapping to handle each of these cases and find the corresponding term and layout needed for schema.org.