dbpedia / gstore

Git repo / triple store hybrid graph storage
Apache License 2.0
3 stars 0 forks source link

Invoke Jena RDF Parser with target Graph URI as the base parameter #39

Closed holycrab13 closed 3 weeks ago

holycrab13 commented 1 month ago

Jena Model parser has a base field that is used to resolve relative URIs in the JSONLD.

In Java this looks like this:

 RDFDataMgr.read(model, inputStream, base, Lang.JSONLD);

Would be great to use the graph URI as the base parameter to resolve relative URIs to absolute uris. E.g. if graph URI is https://gstore.de/whatnot/somedoc.jsonld, then

"@id" : "#mod"

would resolve to

"@id"  : "https://gstore.de/whatnot/somedoc.jsonld#mod"

in the virtuoso. Since the document is usually made retrievable via its graph URI, the relative URIs resolve correctly for common JSON-LD parsers.

If a doc does not use any relative URIs, then the parameter is ignored and won't break anything existing.

manonthegithub commented 1 month ago

Could you provide some example document, which I could use to test that?

manonthegithub commented 1 month ago

should we have the resolved uris or relative ones in the saved files on the drive (in git)?

holycrab13 commented 1 month ago

Example

{
  "@context" : "https://raw.githubusercontent.com/dbpedia/databus-moss/dev/devenv/context2.jsonld",
  "@id" : "",
  "wasGeneratedBy" :  {
      "@id" : "#layer",
      "@type" : "DatabusMetadataLayer",
      "version" : "1.0.0",
      "name": "simple",
      "created" : "2024-03-01 14:37:32",
      "used" : "https://databus.dbpedia.org/lrec2024/linguistics/wordnet/2023#wordnet_lang=en.ttl.gz"
  },
  "subject" : [ "oeo:OEO_00020033" ]
}

We cannot resolve, because piping it through the parser would mess up the Document again. If it can be done without messing with the Document structure and losing non-RDF, then it could be a Setting. I think easiest way is to not resolve it and save as is.

manonthegithub commented 1 month ago

It did not work in the first attempt. Specifying base did not do anything and Jena reads the document as empty... I will need to get a deeper look into it.

holycrab13 commented 1 month ago

We are using

RDFDataMgr.read(model, dataStream, baseUrl, lang);

in moss, maybe that's a working method

manonthegithub commented 3 weeks ago

@holycrab13 this method should be not working as well (should be using the same machinery as I do), as I anticipate, but didn't check it yet

manonthegithub commented 3 weeks ago

@holycrab13 okay, I got what is the problem... needs some time to understand how to fix in a best way... it is due to caching context and using cached ones

holycrab13 commented 3 weeks ago

Since we needed it now for dev, I made a branch where it' working. It's on the jan branch and the main edit is in SparqlClient in

 def readModel(data: Array[Byte], lang: Lang, baseUrl: String, context: Option[util.Context]): Try[(Model, List[Warning])] = Try {

    log.debug(s"Parsing with base url: ${baseUrl}")

    val model = ModelFactory.createDefaultModel()
    val dataStream = new ByteArrayInputStream(data)
    val dest = StreamRDFLib.graph(model.getGraph)

    RDFDataMgr.read(model, dataStream, baseUrl, lang);
    val eh = newErrorHandlerWithWarnings

    (model, eh.warningsList)
  }

where I also pass the baseUrl. Then in the ApiImpl I pass the graphId in the saveFile function

  override def saveFile(repo: String,
                        path: String,
                        body: String,
                        prefix: Option[String],
                        author_name: Option[String],
                        author_email: Option[String])
                       (request: HttpServletRequest): Try[OperationSuccess] = {

    val pa = gitPath(path)
    val graphId = generateGraphId(prefix.getOrElse(getPrefix(request)), repo, pa)
    val ct = Option(request.getContentType)
      .map(_.toLowerCase)
      .getOrElse("")
    val lang = mapContentType(ct, defaultLang)
    val ctxU = contextUrl(body.getBytes, lang)
    val ctx = ctxU.map(cu => jenaJsonLdContextWithFallbackForLocalhost(cu, request.getRemoteHost).get)
    validateEmail(author_email).flatMap(email =>
      readModel(body.getBytes, lang, graphId, ctx)
        .flatMap(model => {
          saveToVirtuoso(model._1, graphId)({
            saveFiles(repo, Map(pa -> body.getBytes), author_name, email)
                .map(hash => OperationSuccess(graphId, hash))
          }).transform(Success(_), e =>
            if (model._2.isEmpty) {
              Failure(e)
            } else {
              val ee = new RuntimeException(
                s"Error saving data, potentially caused by: ${model._2.map(_.message).fold("")((l, r) => l + '\n' + r)}",
                e)
              ee.setStackTrace(Array.empty)
              Failure(ee)
            })
        })
    )
manonthegithub commented 3 weeks ago

@holycrab13 okay, so I managed to finally integrate the relative uri resolution, the question here is: should we save in the git resolved Uris or not? -> also return in read the resolved Uris or not?

holycrab13 commented 3 weeks ago

No and no, we should test the second one with the jsonld playground first. Afaik, this should be resolved locally.

manonthegithub commented 3 weeks ago

I managed to achieve what you want with Uris! Hurray! Soon will be delivered

manonthegithub commented 3 weeks ago

should work now in dev branch