biscicol / triplifier

The triplifier converts Spreadsheets, databases, and Darwin Core Archives into RDF/N3 files suitable for use on the Semantic Web.
1 stars 0 forks source link

Create D2RQ mapping for the GBIF dataset aware registry #10

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
GBIF are making their registry "dataset" aware.  This means all datasets will 
get a unique and persistent identifier which can then be used at the record 
level using (e.g.) dwc:datasetID.  All GBIF indexed occurrences will be tied to 
these IDs, so we can start improving linkages between GBIF indexed records and 
the BiSciCol view of a record which will likely have more detail.  A 
dwc:catalogNumber, and a dwc:datasetID might be suffice to provide these links.

D2RQ appears to be a suitable technology to expose the GBIF registry at a 
SPARQL endpoint.  GBIF would greatly welcome assistance in creating a D2RQ 
mapping so that the GBIF registry is exposed in a format that is readily 
digestible.  GBIF would run D2RQ, so that N triples would then be exportable.

A dump of a dev GBIF database is available: 
http://dl.dropbox.com/u/608155/biscicol/registry_dev.dump - please note that 
this is development database only suitable for this mapping work.  The dataset 
aware registry should be ready to go live in Q2 2012. 

A basic D2RQ mapping is available as a starting point: 
http://dl.dropbox.com/u/608155/biscicol/mapping.ttl

Please contact timrobertson100@gmail.com if you are keen to assist, so he can 
provide guidance on the DB structure.

Thanks!

Original issue reported on code.google.com by timrobertson100 on 10 Apr 2012 at 8:58

GoogleCodeExporter commented 9 years ago
This best implemented not through UI but through direct java code, which can 
read all of the native D2RQ mapping elements and translate to N3.  In the 
future, we may want to support this functionality at the UI level, but for now 
easiest to implement on the back-end.  

This work can proceed right now, and we'll want likely want to store the GBIF 
mapping.ttl file in SVN so it can be updated and versioned.  Best to have a 
BiSciCol developer working with GBIF in creating a comprehensive mapping file.

Another component here which i'll mention (and perhaps can later be moved to a 
separate task) is re-incorporating dataset IDs into the source dataset and 
joining to individual records.

Original comment by jdec...@gmail.com on 10 Apr 2012 at 9:15

GoogleCodeExporter commented 9 years ago
Do we need to involve Triplifier at all here? Seems that "GBIF would run D2RQ" 
themselves, "so that N triples would then be exportable" to BiSciCol directly.

Original comment by u...@ufl.edu on 12 Apr 2012 at 6:34

GoogleCodeExporter commented 9 years ago
It is true.  This TODO should be moved to somewhere else, but I didn't know 
where to place it.

Original comment by timrobertson100 on 13 Apr 2012 at 7:54

GoogleCodeExporter commented 9 years ago
Per discussion on April 17th (and following up April 10th), decided this issue 
for BiSciCol is really a matter of hosting N3 files.  Recommend solution for 
GBIF to proceed on creating triples with their dataset Identifiers and sending 
BiSciCol this N3 file.  Follow-up issue on BiSciCol list is:

Issue #19 (create directory of N3 files)

Original comment by jdec...@gmail.com on 18 Apr 2012 at 1:44