Klortho / eutils-org

Project to produce RDF output for some NCBI E-utilities
5 stars 2 forks source link

Reconcile JATS2RDF with Biotea scheme #15

Open Klortho opened 10 years ago

Klortho commented 10 years ago

When I initially implemented RDF output, I used the ontologies and data structures defined in the paper Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data as a guide. See this XSLT, which was a start at implementing the Biotea RDF.

Then, in #6, I ported the JATS2RDF XSLT from From Markup to Linked Data: Mapping NISO JATS v1.0 to RDF using the SPAR (Semantic Publishing and Referencing) Ontologies.

Both, I think, have pros and cons. I would like now to reconcile these, to select the best from each, and to finalize what should be the output of (at least the metadata portions of) JATS articles in PMC, as well as PubMed records, which should be very similar.

Klortho commented 10 years ago

jats2rdf-graph1

ljgarcia commented 10 years ago

Some things that are in Biotea and I am missing here:

An of course all elements related to the document structure (sections and paragraphs), content, and annotations. I think the content can be omitted, but the structure and the annotations can be useful for different analysis (for instance, we are currently working on similarity)

ljgarcia commented 10 years ago

image

ljgarcia commented 10 years ago

In Biotea, we chose the second approach, full-content. With some limitations: tables, formulas, those are difficult to handle. image

ljgarcia commented 10 years ago

And the annotations, I would say one of the strongest points in Biotea. Here we are using the Annotation Ontology (https://code.google.com/p/annotation-ontology/) but I think is time to move to the Open Annotations (http://www.openannotation.org/spec/core/). Annotations in Biotea relate to the actual paragraph where they are found. I think we should at least keep trace of the Section. This facilitates analysis based on the structure of the article. image

Klortho commented 10 years ago

These are great! Do you guys have your conversion code out in the open? I can only find the datasets from your home page.

I have also heard good things about the open annotation ontology. Will it mean major changes for you to migrate your output to that format?

ljgarcia commented 10 years ago

Be careful, those datasets are out-of-date. We are generating the new ones, using the new web services from Bioportal for the NCBO Annotator. The code is in java and is not currently open but we plan to make open.

As for the Open Annotation, it is not that difficult to move our to this model... it is as usual just a matter of finding the time.

Klortho commented 10 years ago

jat2rdf2