boettiger-lab / eml2schema

0 stars 0 forks source link

EML to Schema #4

Open cboettig opened 6 years ago

cboettig commented 6 years ago

Test out the eml2schema.jq mapping you've written on each of the EML files in our example set. You'll want to continue to modify the mapping to handle each of these cases and find the corresponding term and layout needed for schema.org.

AlexLi0104 commented 6 years ago

@cboettig

Hi! I started to use the eml2schema.jq map on other EML files, and I ran into some compatibility issues. When I apply it directly on the document citation-sbclter-bibliography.51 it doesn't work, and nothing is shown when I knit it. After some trials I think the problem originates from the format of the creator element. In the hf205 file the everything in the creator element is nested in [ ], while in the citation-sbclter-bibliography.51 file there is no [ ]. And that needs to be adjusted specifically.

So as you can see from the eml2schema.jq map, in lines 21 and 46, if I put [] after .creator, then the script works for hf205 but not citation-sbclter-bibliography.51, if I don't put [] then the opposite happens. I am still trying to figure out how to solve this problem, and I wonder if you have any suggestions.

Besides that, I have written some more for the citation EML files, and those are included in the eml_to_schema2.jq file that I uploaded (I didn't included other elements here). It is knitted in the .Rmd file as well using the file citation-sbclter-bibliography.50 as an example. Please take a look at your convenience and let me know what should be improved.

Thank you very much! And Happy Chinese New Year (which was yesterday)!

AlexLi0104 commented 6 years ago

@cboettig

I am also trying to simplify the creator code a little bit (not using a bunch of conditional statements), but there are still some bugs that I need to fix. I will upload that as well when I finish.

Thank you!

cboettig commented 6 years ago

Sounds good, thanks for the update!

AlexLi0104 commented 6 years ago

@cboettig

Greetings! I added a few more elements to the eml_to_schema.jq file, based on the eml files that I translated before. However, there are many things in those files that I can't find a corresponding category here http://schema.org/Dataset. For instance, there is a distribution category in many of those files, but it doesn't really match anything on the schema.org website. Many files also have attribute or attributelist that I also don't know what to do with. Please let me know if you have any suggestions!

Thank you very much!

cboettig commented 6 years ago

@AlexLi0104 Thanks, good questions.

you should map the EML notion of distribution, to schema.org distribution, which you will see takes an object of class DataDownload, http://schema.org/distribution. Check out https://developers.google.com/search/docs/data-types/dataset for a detailed example. Both of these are used to describe "where to get the data".

For attribute, this is basically a description of what each column of data in a csv file or spreadsheet contains (i.e. column 1 has "species names", column 2 has "weight of organism in grams", but using the markup.) You'll want to map these to the schema property variableMeasured, like you see in this example here: https://github.com/boettiger-lab/eml2schema/blob/master/examples/earthcube.json#L196-L214 . (This one may be a bit tricky and require more thinking about what each field actually means).

AlexLi0104 commented 6 years ago

@cboettig

Greetings! I have added DataDownload and variableMeasured into the script. I noticed that distribution is often located in different parts of the file (most are in dataset, but some are nested). I have tried to write a single code that takes care of all occurrences of distribution, but it didn't work. So for now I have put some typical locations in to an array, and if distribution doesn't occur there it just gives a null.

For variableMeasured, I am not sure whether I mapped the fields correctly, and also for some reason the output only contains the first set of values. Please feel free to take a look (right now I use the eml-dataset file in examples/eml as test file), and I will keep working on this.

I am truly sorry that I have not done much for recent weeks (there's a lot going on in all my classes). I will try to do more and wrap up before the finals, and please let me know if there are some other things to do!

Thank you very much!

AlexLi0104 commented 6 years ago

@cboettig

Greetings! I made some additional changes to dataDownload and variablesMeasured during the weekend, and it should be a bit more comprehensive than last week. I also included a delpaths for some categories so that the nulls don't show up when knitted. Please feel free to take a look!

Thank you!

cboettig commented 6 years ago

Nice! Going through these today!

AlexLi0104 commented 6 years ago

@cboettig

Greetings! I am going back to China today and may not may not have access to my email and Github (depending on whether Berkeley VPN still works). All the latest changes have been uploaded. Please feel free to take a look! I also completed the URAP end of semester evaluation yesterday.

Thank you so much for a wonderful semester. This is my first research experience and it has been exciting and rewarding. I have learned a lot, and I hope that I could continue working for you in the next semester if possible!

Have a wonderful summer!

cboettig commented 6 years ago

@AlexLi0104 Thanks, great work, I think we got a great start and my team will enjoy testing this out this summer. We'll have some interesting new directions for the Fall semester building on this so it would be great to continue working with you.

Safe travels and all the best,

Carl