USGCRP / gcis-ontology

Ontology for the Global Change Information System
4 stars 7 forks source link

Article->AcademicArticle #181

Closed justgo129 closed 8 years ago

justgo129 commented 8 years ago

Per #180.

justgo129 commented 8 years ago

Excellent catch, @rewolfe. What would the proper value of the flag values would be? I'm unable to locate the proper variable in a google search.

rewolfe commented 8 years ago

I included the link to the function description. The bits are described in the box. It would be interesting in see what happens if we just set the variable to 0. On Jan 7, 2016 12:18 PM, "justgo129" notifications@github.com wrote:

Excellent catch, @rewolfe https://github.com/rewolfe. What would the proper value of the flag values would be? I'm unable to locate the proper variable in a google search.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/pull/181#issuecomment-169736991.

justgo129 commented 8 years ago

I could try it on dev, and then just revert the commit on gcis-rdf. Travis CI doesn't run on gcis-rdf.

rewolfe commented 8 years ago

You should be able to do it on Dev without a commit. On Jan 7, 2016 12:35 PM, "justgo129" notifications@github.com wrote:

I could try it on dev, and then just revert the commit on gcis-rdf. Travis CI doesn't run on gcis-rdf.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/pull/181#issuecomment-169748170.

justgo129 commented 8 years ago

Just did it on dev - I don't think it worked:

At: http://data.gcis-dev-front.joss.ucar.edu/sparql:

select * FROM <http://data.gcis-dev-front.joss.ucar.edu> where { ?s a gcis:AcademicArticle }

produces zero results The triplestore takes 32 mins to load, in-lieu of 31.

justgo129 commented 8 years ago

from the log files: This looks promising from a debugging perspective.

SQL> : [Fri Jan 24 11:57:23 2014] [info] /article/10.1002/eco.158 [Fri Jan 24 11:57:23 2014] [debug] DB.DBA.TTLP_MT(file_to_string_output('/tmp/hymOCAGlE4'),'','http://tmp.data.globalchange.gov', 255); : Connected to OpenLink Virtuoso Driver: 06.01.3127 OpenLink Virtuoso ODBC Driver OpenLink Interactive SQL (Virtuoso), version 0.9849b. Type HELP; for help and EXIT; to exit. SQL> Done. -- 1 msec. SQL> : [Fri Jan 24 11:57:23 2014] [info] /article/10.1002/env.2140 [Fri Jan 24 11:57:23 2014] [debug] DB.DBA.TTLP_MT(file_to_string_output('/tmp/u8QqY5aIpw'),'','http://tmp.data.globalchange.gov', 255); : Connected to OpenLink Virtuoso Driver: 06.01.3127 OpenLink Virtuoso ODBC Driver OpenLink Interactive SQL (Virtuoso), version 0.9849b. Type HELP; for help and EXIT; to exit. SQL>

rewolfe commented 8 years ago

@justgo129 - What does the log show? You may want to just try loading just the (first 100?) articles and nothing else.

On Thu, Jan 7, 2016 at 2:29 PM, justgo129 notifications@github.com wrote:

Just did it on dev - I don't think it worked:

At: http://data.gcis-dev-front.joss.ucar.edu/sparql:

select t * FROM http://data.gcis-dev-front.joss.ucar.edu where { ?s a gcis:AcademicArticle } produces zero results The triplestore takes 32 mins to load, in-lieu of 31.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/pull/181#issuecomment-169781939.

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

rewolfe commented 8 years ago

It looks like you are still using 255, not 0.

On Thu, Jan 7, 2016 at 2:41 PM, justgo129 notifications@github.com wrote:

from the log files:

SQL> : [Fri Jan 24 11:57:23 2014] [info] /article/10.1002/eco.158 [Fri Jan 24 11:57:23 2014] [debug] DB.DBA.TTLP_MT(file_to_string_output('/tmp/hymOCAGlE4'),'',' http://tmp.data.globalchange.gov', 255); : Connected to OpenLink Virtuoso Driver: 06.01.3127 OpenLink Virtuoso ODBC Driver OpenLink Interactive SQL (Virtuoso), version 0.9849b. Type HELP; for help and EXIT; to exit. SQL> Done. -- 1 msec. SQL> : [Fri Jan 24 11:57:23 2014] [info] /article/10.1002/env.2140 [Fri Jan 24 11:57:23 2014] [debug] DB.DBA.TTLP_MT(file_to_string_output('/tmp/u8QqY5aIpw'),'',' http://tmp.data.globalchange.gov', 255); : Connected to OpenLink Virtuoso Driver: 06.01.3127 OpenLink Virtuoso ODBC Driver OpenLink Interactive SQL (Virtuoso), version 0.9849b. Type HELP; for help and EXIT; to exit. SQL>

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/pull/181#issuecomment-169785197.

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

justgo129 commented 8 years ago

nope, I've confirmed I'm using 0

justgo129 commented 8 years ago

I solved it. We need to add a dbpprop.rdf file

zednis commented 8 years ago

Was the missing dbpprop.rdf file causing an error we were missing?

justgo129 commented 8 years ago

I'm investigating it.

[jgoldste@gcis-dev-back (master) ext]$ ls
biro.ttl  dcmitype.ttl  ext.ttl   meth.ttl  place.ttl  rdf.ttl
co.ttl    dcterms.ttl   foaf.ttl  org.ttl   prov.ttl   skos.ttl
dcat.ttl  dc.ttl        frbr.ttl  owl.ttl   rdfs.ttl   vivo.ttl

run

select * from <http://data.globalchange.gov> where { ?s a gcis:Book }

and notice we have a similar error to that with articles. What do books and articles have in common? They both use dbpprop:pubYear .That's consistent with the contents in the top of this github comment.

I found the downloadable OWL file, but not a .ttl file for the dbpedia properties "ontology." http://mappings.dbpedia.org/server/ontology/dbpedia.owl

We need the .ttl, I would guess.

rewolfe commented 8 years ago

It will be interesting if dbpprop is the problem.

Related to my previous message. This is where I see "255" being used:

[Fri Jan 24 11:57:23 2014] [debug] DB.DBA.TTLP_MT(file_to_string_output('/tmp/hymOCAGlE4'),'',' http://tmp.data.globalchange.gov', 255); : Connected to OpenLink Virtuoso

On Thu, Jan 7, 2016 at 3:15 PM, justgo129 notifications@github.com wrote:

I'm investigating it.

[jgoldste@gcis-dev-back (master) ext]$ ls biro.ttl dcmitype.ttl ext.ttl meth.ttl place.ttl rdf.ttl co.ttl dcterms.ttl foaf.ttl org.ttl prov.ttl skos.ttl dcat.ttl dc.ttl frbr.ttl owl.ttl rdfs.ttl vivo.ttl

run select * from http://data.globalchange.gov where { ?s a gcis:Book } and notice we have a similar error to that with articles. What do books and articles have in common? They both use dbpprop:pubYear .That's consistent with the contents in the top of this github comment.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/pull/181#issuecomment-169793319.

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

justgo129 commented 8 years ago

It's worth testing. @zednis do you know where I could locate the .ttl file? Replacing .owl with .ttl in the URL for the owl file produces an error. No real luck here either.

rewolfe commented 8 years ago

You should be able to load either RDF or TTL. See: https://github.com/USGCRP/gcis-rdf/blob/master/load_rdf_sources.pl

On Thu, Jan 7, 2016 at 3:27 PM, justgo129 notifications@github.com wrote:

It's worth testing. @zednis https://github.com/zednis do you know where I could locate the .ttl file?

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/pull/181#issuecomment-169795797.

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

zednis commented 8 years ago

The dbpedia OWL file does not have the dbpprop:pubYear property defined in it.

This property, and it seems other dbpprop properties, are published in a linked data manner - the URI is resolvable to the RDF but there does not appear to be a static RDF file with all property definitions.

You could create a simple RDF file containing for with the property definition by doing a CURL on http://dbpedia.org/data4/pubYear.n3 and saving the results locally.

justgo129 commented 8 years ago

That would explain the lack of a wikipedia article for this property, as well as the "4" in the title of the dbpedia page. @zednis we also use "dbpprop:launchDate". Would we need to create a separate dbpprop file for that as well, but give the namespace prefix a different name? i e., not dbpprop since that would be used for pubYear?

zednis commented 8 years ago

@justgo129 The prefix is not tied to the file, it is just syntactic sugar for writing URIs in various RDF encodings.

rewolfe commented 8 years ago

A script could walk the dbpprop website and create a ttl file for an of the fields. On Jan 7, 2016 11:21 PM, "Stephan Zednik" notifications@github.com wrote:

@justgo129 https://github.com/justgo129 The prefix is not tied to the file, it is just syntactic sugar for writing URIs in various RDF encodings.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/pull/181#issuecomment-169889528.

justgo129 commented 8 years ago

@rewolfe just as a point of information, how long would that take? I'd like to be able to fix this issue ASAP in order to ensure a complete triplestore.

rewolfe commented 8 years ago

I would just go ahead and create the ones we use for now (I assume it is a short list). They can all be in the same ttl file.

On Fri, Jan 8, 2016 at 8:51 AM, justgo129 notifications@github.com wrote:

@rewolfe https://github.com/rewolfe just as a point of information, how long would that take? I'd like to be able to fix this issue ASAP in order to ensure a complete triplestore.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/pull/181#issuecomment-170009158.

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

zednis commented 8 years ago

Agreed with @rewolfe, the property definitions should all be able to be in the same file. I am also fairly certain all properties defined in the dbbprop namespace could be defined with a single statement giving them rdf:type rdf:Property (optionally also a label). None of the examples I have seen so far have any range or domain.

I am curious though why we need this property file, do we know for sure this is the cause of the problem or is this just a guess as to the cause? I do not understand why this would be causing the problem we are experiencing.

justgo129 commented 8 years ago

@zednis @rewolfe there's no way to know for sure without doing another virtuoso reload. As I agree this is not the most efficient approach, I'm game for other diagnostic approaches but I'm not sure of others.

zednis commented 8 years ago

some ideas:

justgo129 commented 8 years ago

hmm, do you mean something like printing a "1" after the first few lines of turtle, "2" after the next few, etc.and then running the virtuoso input? We do capture error statements in the log, some of which say "[debug]". Other than the returned text to which @rewolfe mentions above, I didn't see more specific information.

rewolfe commented 8 years ago

Justin, I think actually turning on the log messages during the load is the best approach (255 -->> 0). We can try that this afternoon when we are back at the office (or on Monday).

On Fri, Jan 8, 2016 at 11:53 AM, justgo129 notifications@github.com wrote:

hmm, do you mean something like printing a "1" after the first few lines of turtle, "2" after the next few, etc.and then running the virtuoso input? We do capture error statements in the log, some of which say "[debug]". Other than the returned text to which @rewolfe https://github.com/rewolfe mentions above, I didn't see more specific information.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/pull/181#issuecomment-170053699.

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

zednis commented 8 years ago

I would prefer that we write the RDF generated by the templates to file(s) and then loaded the RDF files into virtuoso. This would allow us to run the process and save the generated RDF outside of virtuoso for manual inspection and archive. It would also allow us to do post-processing (such as inference) before the load.

rewolfe commented 8 years ago

+1

On Fri, Jan 8, 2016 at 11:56 AM, Stephan Zednik notifications@github.com wrote:

I would prefer that we write the RDF generated by the templates to file(s) and then loaded the RDF files into virtuoso. This would allow us to run the process and save the generated RDF outside of virtuoso for manual inspection and archive. It would also allow us to do post-processing (such as inference) before the load.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/pull/181#issuecomment-170054531.

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

zednis commented 8 years ago

additionally, a simple log statement such as "template X generated {URI}" so we can verify that a template was run for a given resource.

justgo129 commented 8 years ago

Sounds good. Tabled #181 until Monday.