buda-base / lds-pdi

http://purl.bdrc.io BDRC Linked Data Server
Apache License 2.0
2 stars 0 forks source link

ontology/shapes webhook updates unreliable #186

Closed xristy closed 4 years ago

xristy commented 4 years ago

Pushing commits to owl-schema and editor-templates do not reliably update fuseki.

I pushed a commit to owl-schema yesterday, a8885c removing the unused UNKNOWNS and it did not get reflected on fuseki.

I tried:

curl -X POST "http://purl.bdrc.io/callbacks/github/owl-schema"

but that fails now

{"timestamp":"2020-06-05T13:56:23.645+0000","status":415,"error":"Unsupported Media Type","message":"Content type '' not supported","path":"/callbacks/github/owl-schema"}

it's been in buda2 db-load to ensure fuseki is up-to-date on the ontologies.

Next I tried:

curl -X POST "http://purl.bdrc.io/clearcache"

and that seemed to work for that push. Then I pushed a second commit, 737fb5, to clean-up UNKNOWNS from translations, but that didn't take hold so I tried clearcache again and that didn't work and I tried a bogus commit which did not work either.

The translation triples like:

bdr:UNKNOWN_Place  rdfs:label  "sa gnas gsal kha med pa/"@bo-x-ewts .

are still in fuseki at this point.

I need both owl-schema and editor-templates to reliably update to help w/ developing and debugging.

xristy commented 4 years ago

I should mention that the translation triples are in bdg:trans_core_bo. So there's a question of how that graph is getting updated on fuseki when a change is pushed in

owl-schema/translations/core_bo.ttl
MarcAgate commented 4 years ago

You have to specify the graph in ontPolicy.rdf (ontGraph property) so it will normaly put this ont data in this graph.

I delete the ontologySchema graph and ran the loading process. However, I still have the triple you mentioned above. It must be somewhere in a file loaded by the OntDocManager.

xristy commented 4 years ago

the triple among similar ones is in owl-schema/translations/core_bo.ttl.

The owl-schema/ont-policy.rdf has:


    <OntologySpec>
        <!-- local version of the Admin translations vocabulary -->
        <publicURI rdf:resource="http://purl.bdrc.io/ontology/translations/CoreBo/"/>
        <altURL    rdf:resource="owl-schema/translations/core_bo.ttl"/>
        <altURL    rdf:resource="https://raw.githubusercontent.com/buda-base/owl-schema/master/translations/core_bo.ttl"/>
    </OntologySpec>

so I would assume it would be loaded into bdg:ontologySchema since that is:

<adm:defaultOntGraph   rdf:resource="http://purl.bdrc.io/graph/ontologySchema"/>

I do not know how bdg:trans_core_bo comes from or whether it's needed now or not. @eroux ?

The translation triples are also loaded into bdg:ontologySchema as makes sense from the ont-policy.rdf

xristy commented 4 years ago

So now we know that bdg:trans_core_bo is not needed and it has been droipped by @MarcAgate from fuseki corerw.

It is sufficient to update repo ontology-translation and the owl-schema/translations files

xristy commented 4 years ago

@MarcAgate I just pushed commit 27e583 to editor-templates and bdg:PersonLocalShapes was updated but bdg:PersonShapes didn't get completely updated.

It looks like the prior version of person.local.shapes.ttl got merged with the pushed version of person.shapes.ttl.

The nature of the changes is such that I can't tell what happened w/ person.ui.shapes.ttl since it is an accumulation of person.shapes.ttl and person.local.shapes.ttl which makes it hard to tell whether triples were multiple times from several files.

xristy commented 4 years ago

I also tried:

curl -H "Content-Type: application/json" -X POST "http://purl.bdrc.io/callbacks/github/editor-templates"

but that just gives:

{"timestamp":"2020-06-05T21:40:03.839+0000","status":405,"error":"Method Not Allowed","message":"Request method 'POST' not supported","path":"/callbacks/github/editor-templates"}
MarcAgate commented 4 years ago

I put the full log of the update here below the first part (OntPolicy for uri .... etc ) gives you the relation ship between graphs and files according to ontPolicy.rdf.

The end gives you the update of each graph and its size (dont worry about the "InfModel Size" display, as it is wrong since there is no inference here). Hopefully you'll be able to trace things from that. I don't know about the correct (or expected) imports/merge etc.

Log-from-MA-for-lds-pdi-issue-186.txt

xristy commented 4 years ago

I'm not seeing any obvious problems in the log that you sent. The log information doesn't show the importing or file dates and so on that would be needed to see what is getting merged in ldspdi.

I realize that my note from yesterday evening didn't provide any details that might be needed in tracking stuff down so I'll recount them below:

Yesterday evening's commit 27e583f was in part about moving UI related triples from person.local.shapes.ttl and person.shapes.ttl to person.ui.shapes.ttl. These were triples with predicates sh:name, sh:description, and dash:editor. From the person.local.shapes.ttl section at the top of the commit are the following:

Checking http://purl.bdrc.io/shapes/core/PersonLocalShapes confirms that these triples are not present from ldspdi and running an appropriate query on fuseki (construct over bdg:PersonLocalShapes) shows they are also not present in corerw in this graph. This is as expected based on the commit.

Note/ the response to http://purl.bdrc.io/shapes/core/PersonLocalShapes/ does include occurrences of the predicates sh:name, sh:description, and dash:editor; however, those are coming from root.shapes.ttl and event.shapes.ttl which I have not yet factored - that was my next task./Note

The problem is that when visiting http://purl.bdrc.io/shapes/core/PersonShapes the six triples identified above are present in the result and with the appropriate query (construct over bdg:PersonShapes) on fuseki - so ldspdi is serving up the same thing that appears on fuseki but as far as I can tell ldspdi has loaded bdg:PersonShapes content with triples that are not there in GH.

Note/ The import chain is:

PersonShapes
    PersonLocalShape
        EventShapes
            RootShapes
                BaseShapes

/Note

It is relevant to note that the various triples that were in person.shapes.ttl prior to this latest commit and which I moved to person.ui.shapes.ttl are indeed not present in bdg:PersonShapes on fuseki or in the response to visiting http://purl.bdrc.io/shapes/core/PersonShapes which is as expected.

To make matters much more odd there is content present in bdg:PersonShapes - on fuseki and from ldspdi - that was removed from persons.local.shapes.ttl via commit b44404 on 3 June, 4 commits prior to last evening's commit. Specifically:

bds:PersonEventShape-personEventType
   a             sh:PropertyShape ;
   dash:editor   dash:InstancesSelectEditor ;
   sh:class      bdo:PersonEventType ;
   sh:maxCount   1 ;
   sh:message    "exactly one PersonEventType required"@en ;
   sh:minCount   1 ;
   sh:name       "role associated with the event"@en ;
   sh:path       bdo:eventType .

and:

bds:PersonEventShape  a  sh:NodeShape ;
   rdfs:label    "Person Event Shape"@en ;
   bds:nodeShapeType  bds:FacetShape ;
   sh:property   bds:PersonEventShape-personEventCorporation , bds:PersonEventShape-personEventRole , bds:PersonEventShape-personEventType ;
   sh:targetClass  bdo:PersonEvent .

The bds:PersonEventShape-personEventCorporation was moved to PersonShapes in a still earlier commit 8ca252 from 2 June and so is expected to be in PersonShapes, but the reference to bds:PersonEventShape-personEventType should not be there.

I have added OntTestLoading3.java, using juist OntDocumentManager, to shapes-testing repo. It loads PersonLocalShapes and PersonShapes per editor-templates/ont-policy.rdf, processing imports, and then writing out both the model from the file, person.local.shapes.ttl and person.shapes.ttl, and the aggregate models from processing imports.

The results are as expected based on the master branch of editor-templates. There are no stale triples appearing in the PersonShapes full model/graph.

If you want to run it you'll need to change L18 to reflect where you want the output files written.

I do not know what more I can do at this point. The evidence seems to point to something in ldspdi. The GH content is as intended. The OntDocumentManager w/ editor-templates/ont-policy.rdf appears to produce the expected results. Also it seems that ldspdi is updating newcorerw in the same manner as corerw.

xristy commented 4 years ago

@MarcAgate on 9 June @ 19:55Z, I pushed commit bdcb02 - no commits since then.

This commit deleted

sh:description "Zero or more notes may be associated with an entity"@en ;

from the defn of bds:EntityShape-note in root.local.shapes.ttl. That triple is still present in bdg:PersonShapes and bdg:PersonLocalShapes when retrieving via graph uri from lds-pdi or via construct on fuseki..

The commit also added 25 occurrences of sh:message in root.local.shapes.ttl. The message triples occur mostly in bds:NoteShape-... and bds:ContentLocationShape-... property shape defns. None of the added sh:messages appear in bdg:PersonShapes, bdg:PersonLocalShapes, and bdg:PersonUIShapes. All of the sh:messages appear in bdg:shapesSchema which is the adm:defaultOntGraph.

Please refer to the README.md for updated info on the import patterns.

It is also worth noting that the triples that were deleted but still appearing in bdg:PersonShapes from the comment just above are now no longer present as was to be expected from commit 27e583f.

There is also an anomaly in bdg:shapesSchema. In root.local.shapes.ttl is

bds:EntityShape-skos_prefLabel
  a sh:PropertyShape ;
  sh:message "each Entity resource must have at least one skos:prefLabel and each must be a unique language"@en ;
  sh:path skos:prefLabel ;
  sh:datatype rdf:langString ;
  sh:languageIn (
      "en" "zh" "bo" "bo-x-ewts" "km-x-femc" "km" "fr" "km-x-bdrc" 
    ) ;
  sh:minCount 1 ;
  sh:uniqueLang true ;
.

and in all the adm:ontGraph named graphs there is a single occurrence of

  sh:languageIn (
      "en" "zh" "bo" "bo-x-ewts" "km-x-femc" "km" "fr" "km-x-bdrc" 
    ) ;

but in bdg:shapesSchema there are two occurrences as can be seen from:

select ?s ?p ?o ?g
where {
  bind (sh:languageIn as ?p)
  graph ?g { ?s ?p ?o . }
} limit 3000

and from:

construct { ?s ?p ?o . }
where {
  bind (bdg:shapesSchema as ?g)
  graph ?g { ?s ?p ?o . }
} limit 3000

and then looking at the defn:

bds:EntityShape-skos_prefLabel
        a               sh:PropertyShape ;
        sh:datatype     rdf:langString ;
        sh:description  "require unique language from among the listed choices"@en ;
        sh:languageIn   ( "en" "zh" "bo" "bo-x-ewts" "km-x-femc" "km" "fr" "km-x-bdrc" ) ;
        sh:languageIn   ( "en" "zh" "bo" "bo-x-ewts" "km-x-femc" "km" "fr" "km-x-bdrc" ) ;
        sh:message      "each Entity resource must have at least one skos:prefLabel and each must be a unique language"@en ;
        sh:minCount     1 ;
        sh:name         "pref label"@en ;
        sh:order        "1"^^xsd:decimal ;
        sh:path         skos:prefLabel ;
        sh:uniqueLang   true .

At first I wasn't sure whether this was a problem with Jena so I added OntTestLoading4 to see if I could reproduce the double occurrence via OntDocumentManager, but it seems that the double occurrence, apparently, owing to distinct blank nodes may be in lds-pdi.

A pattern may be emerging: changes to shapes file A appear in bdg:A (if defined). If B imports A then the changes in A do not appear in bdg:B (if defined) or subsequent imports of B (if any).

Hopefully, this will help pinpoint where the problem in lds-pdi is.

MarcAgate commented 4 years ago

I have prepared basic questions with yes/no answer (please explain very briefly is the answer is a "No") - These questions apply to the current state of the editor-templates repo. ( commit 8d5dd4a )

http://purl.bdrc.io/graph/WorkShapes
http://purl.bdrc.io/graph/PersonUIShapes
http://purl.bdrc.io/graph/PersonLocalShapes
http://purl.bdrc.io/graph/PersonShapes
http://purl.bdrc.io/graph/ItemShapes
http://purl.bdrc.io/graph/InstanceShapes
http://purl.bdrc.io/graph/CorporationShapes
http://purl.bdrc.io/graph/shapesSchema

1) Are the graph in the above list, designated by their uris, the only graphs we should find on fuseki? 2) Is http://purl.bdrc.io/graph/shapesSchema supposed to be a merge of all other grapghs in the list?

http://purl.bdrc.io/shapes/core/IdentifierShapes/
http://purl.bdrc.io/shapes/core/EventUIShapes/
http://purl.bdrc.io/shapes/core/EventLocalShapes/
http://purl.bdrc.io/shapes/core/EventShapes/
http://purl.bdrc.io/shapes/core/RootUIShapes/
http://purl.bdrc.io/shapes/core/RootLocalShapes/
http://purl.bdrc.io/shapes/core/RootShapes/
http://purl.bdrc.io/shapes/adm/AdminShapes/
http://purl.bdrc.io/shapes/core/BaseShapes/

3) The ontologies corresponding to the uris above do not have an individual graph in fuseki. Is this correct? 4) This data is dispatched into the graphs of the first list (and pushed to fuseki) par the sole magic of the import feature of theOntDocumentManager. Is this correct ?

All the models as loaded and generated by the OntDocument from OntPolicy.rdf (i.e from the list of documents read from OntPolicy.rdf) have been saved as ttl in https://github.com/buda-base/lds-pdi/tree/master/src/test/resources/ttl/shapes

5) Are these correct ?

I think I'll be able to move further along my debug path once I have answers to these 5 questions.

xristy commented 4 years ago
http://purl.bdrc.io/graph/WorkShapes
http://purl.bdrc.io/graph/PersonUIShapes
http://purl.bdrc.io/graph/PersonLocalShapes
http://purl.bdrc.io/graph/PersonShapes
http://purl.bdrc.io/graph/ItemShapes
http://purl.bdrc.io/graph/InstanceShapes
http://purl.bdrc.io/graph/CorporationShapes
http://purl.bdrc.io/graph/shapesSchema
  1. Are the graph in the above list, designated by their uris, the only graphs we should find on fuseki?

Yes

  1. Is http://purl.bdrc.io/graph/shapesSchema supposed to be a merge of all other grapghs in the list?

Yes

http://purl.bdrc.io/shapes/core/IdentifierShapes/
http://purl.bdrc.io/shapes/core/EventUIShapes/
http://purl.bdrc.io/shapes/core/EventLocalShapes/
http://purl.bdrc.io/shapes/core/EventShapes/
http://purl.bdrc.io/shapes/core/RootUIShapes/
http://purl.bdrc.io/shapes/core/RootLocalShapes/
http://purl.bdrc.io/shapes/core/RootShapes/
http://purl.bdrc.io/shapes/adm/AdminShapes/
http://purl.bdrc.io/shapes/core/BaseShapes/
  1. The ontologies corresponding to the uris above do not have an individual graph in fuseki. Is this correct?

Yes

  1. This data is dispatched into the graphs of the first list (and pushed to fuseki) par the sole magic of the import feature of theOntDocumentManager. Is this correct ?

Yes, except that OntDocumentManager does not push to fuseki.

All the models as loaded and generated by the OntDocument from OntPolicy.rdf (i.e from the list of documents read from OntPolicy.rdf) have been saved as ttl in https://github.com/buda-base/lds-pdi/tree/master/src/test/resources/ttl/shapes

  1. Are these correct ?

I believe so. I closely checked AdminShapes.ttl, BaseShapes.ttl, and PersonShapes.ttl, and skimmed the others.

I think I'll be able to move further along my debug path once I have answers to these 5 questions.

Sounds good.

xristy commented 4 years ago

A combination of OntDocumentManager caching settings and browser caching misbehaviors (do not trust empty caches and the like). Only safe testing is via curl and s-query

xristy commented 4 years ago

But wait! There's more.

I pushed commit af0c1b6: finished adding sh:message to event.local.shapes.ttl; no need to import dash except in root.ui.shapes.ttl. I.e., there were unneeded owl:imports <http://datashapes.org/dash> ;.

Marc verified that the GH webhook fired and ldspdi loaded the files fresh from GH/editor-templates; and the files are saved in buda1:/usr/local/ldspdi/.

grep "<http://datashapes.org/dash>" *.ttl shows that the imports still remain in the following files even though they are not present in GH:

BaseShapes.ttl:        owl:imports      <http://datashapes.org/dash> ;
EventShapes.ttl:        owl:imports      <http://purl.bdrc.io/shapes/core/RootShapes/> , <http://purl.bdrc.io/shapes/core/EventLocalShapes/> , <http://datashapes.org/dash> ;
EventUIShapes.ttl:        owl:imports      <http://purl.bdrc.io/shapes/core/RootUIShapes/> , <http://purl.bdrc.io/shapes/core/EventShapes/> , <http://datashapes.org/dash> ;
PersonLocalShapes.ttl:        owl:imports      <http://purl.bdrc.io/shapes/core/EventLocalShapes/> , <http://datashapes.org/dash> ;
PersonShapes.ttl:        owl:imports      <http://purl.bdrc.io/shapes/core/PersonLocalShapes/> , <http://purl.bdrc.io/shapes/core/EventShapes/> , <http://datashapes.org/dash> ;
RootLocalShapes.ttl:        owl:imports      <http://purl.bdrc.io/shapes/core/BaseShapes/> , <http://datashapes.org/dash> ;
RootShapes.ttl:        owl:imports      <http://purl.bdrc.io/shapes/core/RootLocalShapes/> , <http://datashapes.org/dash> ;

and the other files from which the imports were removed do not retain the dash import:

AdminShapes.ttl
CorporationShapes.ttl
EventLocalShapes.ttl
IdentifierShapes.ttl
InstanceShapes.ttl
ItemShapes.ttl
WorkShapes.ttl

Further, running:

s-query --query TEST_SPARQL_001.txt --server http://buda1.bdrc.io:13180/fuseki/corerw/query  > TEST_OUTPUT/ALL_SHAPES08.ttl 

with the query file TEST_SPARQL_001.txt yields ALL_SHAPES08.ttl which shows that the bdg:shapesSchema contains the triples that were removed. Using the commandline s-query avoids any question of browser caching when using the fuseki web i/f.

Substituting bdg:PersonLocalShapes in the query file and running s-query produces PersonLocalShapes_ALL09.ttl with an occurrence of <http://datashapes.org/dash> in each of BaseShapes:, RootLocalShapes:, and PersonLocalShapes.

Running curl GET "http://purl.bdrc.io/shapes/core/PersonLocalShapes" produces the same result as s-query via ldspdi.

OTOH, running OntTestLoading4 in shapes-testing produces results equivalent to GH contents: PersonLocalShapes_ALL07.ttl. There are no occurrences of <http://datashapes.org/dash>.

The issue still remains and is no tied to cache behavior in web browsers.

eroux commented 4 years ago

Here's a proposal that will allow a bit more debugging: we could use the same graph data as in other graphs, and have something like:

bda:shapesSchema a adm:AdminData ;
                  adm:gitRevision  "xxx" ; # the git revision of the editors-template repo
                  adm:graphId   bdg:shapesSchema ;
                  adm:gitRepo   bda:GR0010 ; # or whatever, a new git repo individual for the editors-template
.

what do you think?

eroux commented 4 years ago

I also think we should always have a local git repo and that the webhooks just do a pull + reimport of the local repo. That way we'll avoid other caches such as the github download URLs.

MarcAgate commented 4 years ago

@xristy

More (hopefully useful) remarks here:

Files under /usr/local/ldspdi are serialization of the models returned by the docManager. These are written while looping/reading over the list of documents built by the Docmanager from OntPolicy.rdf.

If we have correct files (with imports removed) and incorrect files (with imports not being removed), within the same loop, then it means that the issue lies at the level of DocManager.

However, unless I am mistaken, the code loading the model and producing the files in ldspdi is exactly the same as the one you have in OntTestLoading4.

Furthermore, without changing anything to the code, I just restarted ldspi and all the produced files are now as expected.

So it's not the config of DocManager nor it can be the code that runs before pushing to fuseki and produces the ttl serialization files we used for debugging. How can you be sure it's not still a cache issue ?

eroux commented 4 years ago

Let's work with the hypothesis that github doesn't update the download URLs fast enough and when fetching them just after a push, we don't always get the latest version of files. A workaround is to implement my two ideas. Unless there is another hypothesis of course.

MarcAgate commented 4 years ago

Sounds good to me as the hypothesis you describe might actually be the case and using a local git repo thing will obviously solve the issue.

xristy commented 4 years ago

@MarcAgate I'm not sure it is isn't a cache issue. I am sure that it isn't a web browser cache issue since my tests were w/ curl and s-query.

@eroux your hypothesis about a timing issue makes sense. I'm not sure though. If the webhook fires before GH has completed storing the push updates how a pull from GH to a buda1 local repo will work better than reading via ldspdi/OntDocumentManager, unless git pull has some interaction with a push in progress on GH that fetching from GH url doesn't.

xristy commented 4 years ago

I opened a ticket with GH

From: GitHub support@githubsupport.com Subject: [GitHub Support] Confirmation - Request Received (#730928) Date: June 15, 2020 at 10:07:32 AM CDT To: Chris Tomlinson ct@moonvine.org Reply-To: GitHub support@githubsupport.com

//Please do not write below this line// Chris,

Thank you for contacting GitHub Support. We wanted to let you know that we've received your message. In order to respond to tickets with the greatest urgency as quickly as possible during the COVID-19 crisis, we've established a priority order. If you have questions regarding our recent announcement that makes most GitHub features accessible to our community free of charge, we have captured answers to common questions here.

Ticket ID: 730928

This email is a service from GitHub Support. [YDEM8W-3V07]

My Message

We have a webhook push event configured for https://github.com/buda-base/editor-templates. When our server receives:

https://purl.bdrc.io/callbacks/github/shapes  (push)

then our server retrieves files of interest via raw.githubusercontent, like:

https://raw.githubusercontent.com/buda-base/editor-templates/master/templates/core/work.shapes.ttl

Sometimes we appear to end up with "old" content rather than the new pushed content.

Is there possibly a lag between when the push event is signaled and when the content fetched via raw.githubusercontent is up-to-date w/ push to the repo?

I have not been able to find any information regarding when contents retrieved via raw.githubusercontent are guaranteed to be in sync with the GH repo content as seen via git commands like git pull.

Thanks

xristy commented 4 years ago

@MarcAgate have you seen Best way to fetch content via API without hitting cache?

MarcAgate commented 4 years ago

Nice ! That talks about a guthub cache, and therefore might explains what we are experiencing. However, I think using the API is not applicable in our case since the actual download is made by the OntDocumentManager using urls coming from the OntPolicy.rdf.

xristy commented 4 years ago

It turns out that, exploring the api.github.com w/ curl, I see that the request to GET a file redirects to raw.githubusercontent.com.

MarcAgate commented 4 years ago

For our information, I copy that here:

Hi Chris,

Thanks for reaching out and sorry for the delay in getting back to you on this. 
Yeah, the contents returned from raw.githubusercontent.com might be stale since it's cached by our CDN. 
You shouldn't really be using that for programmatic access. 
If you programmatically fetching file contents, you should be using the API: 

http://developer.github.com/v3/

The API has well defined rate-limiting and caching behavior you can rely on. The raw.githubusercontent.com endpoint doesn't, so you might get limited or see cached content without warning.Hope this helps.

MarcAgate commented 4 years ago

I believe this issue is now resolved due to the last changes that occurred yesterday, following a n-th issue with synchronization. The thing is actually that there are two caches being used in the process : The OntDocument Manager cache and the cache of this OntDocumentManager FileManager. Both are now resetted each time the webhook is triggered, as follows:

odm.setCacheModels(false);
odm.getFileManager().resetCache();

There is no issue with lds-queries webhook since it is not Ontology related and not using the OntDocManager machinery.