buda-base / lds-pdi

http://purl.bdrc.io BDRC Linked Data Server
Apache License 2.0
2 stars 0 forks source link

format of the result of the /query API #44

Closed eroux closed 6 years ago

eroux commented 6 years ago

The result of the /query URLs is still not in line with the SPARQL JSON result format... Can you please make it similar? I have the feeling I've been asking this over and over, and it's certainly annoying on both sides. This time can you please make sure that everything is 100% compliant? I guess some unit tests with the Jena parser would make thing easier. The problem the Jena parser currently has is that LDS returns;

"head" : [ "work", "nbVolumes", "access", "license" ],

instead of

  "head": {
    "vars": [  "work", "nbVolumes", "access", "license"  ]
  } 
MarcAgate commented 6 years ago

I fixed the issue and changed the java objects. Note that being 100% compliant will result in a small "loss" of information. For instance, when ldspdi returns:

Work_Name: {
xml:lang: "bo-x-ewts",
datatype: "http://www.w3.org/1999/02/22-rdf-syntax-ns#langString",
type: "literal",
value: "chos dbyings mdzod/"
}

jena returns:

"Work_Name": { "type": "literal" , "xml:lang": "bo-x-ewts" , "value": "chos dbyings mdzod/" }

Also, which Jena parser are you refering to ? Can you point me to the code you are using ? What about the begining of the json output (meta data)? Are you just parsing the "results" json Node?

eroux commented 6 years ago

The result that comes from LDS-PDI is initially retrieved from fuseki in this SPARQL json format. So sending the SPARQL Json format won't bring any loss at all. If you look at the spec, what jena returns means that it is a literal with a lang tag. And in RDF 1.1 the only literal with a lang tag is langString.

The functions I'm using are ResultSetFactory.fromJSON(in); or ResultSetMgr.read(in, ResultSetLang.SPARQLResultSetJSON);

eroux commented 6 years ago

The Jena code that does what we want is in ResultSetWriterJSON. It's pretty low-level, but maybe we would use it somehow, that would be less code to write, and we would have 100% compliant code. I'm not sure we really can because we're adding things to the result, but that would be worth a try I think..

MarcAgate commented 6 years ago

Yes, I agree it's not a big loss since xml:lang indicates it is a langString datatype. In fact, the datatype is required in some other cases and it's a little bit weird when it comes to differentiate Literals:

Literal S | {"type": "literal","value": "S"} Literal S with language tag L | { "type": "literal", "value": "S", "xml:lang": "L"} Literal S with datatype IRI D | { "type": "literal", "value": "S", "datatype": "D"}

It looks like in the first case we have a literal which doesn't have a datatype corresponding to a IRI... Well, I'll try to find examples and I'll manage.

MarcAgate commented 6 years ago

As I suggested above, we still can parse only the "results" node.

eroux commented 6 years ago

Thanks! In my use case I'm only interested in the results yes. I think having all the cases in some unit rests will be good, there aren't many...

MarcAgate commented 6 years ago

Well, we don't need to bother with ResultSetWriterJSON. Actually, we have to use ResultSetFormatter.outputAsJSON(ResultSet resultSet) (https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ResultSetFormatter.html#outputAsJSON-org.apache.jena.query.ResultSet). I tried it at the very beginning when I suggested to use it. It does the job perfectly so we could go that route, assuming I find a way to produce the complete json we want by merging the initial part (page number, etc...) with the json serialization of the result set itself, produced by Jena. I think the parsing issue we have (putting aside wrong json structure) is that our json is built by jackson based on plain java objects while Jena produces a "jsonified" string that represents a map containing nested maps. Also, where we have bindings, Jena has "bindings" (within double quotes).