dbpedia / databus

A digital factory platform for managing files online with stable IDs, high-quality metadata, powerful API and tools for building on data: find, access, make interoperable, re-use
Apache License 2.0
36 stars 16 forks source link

Improve docs for databus model and api`s #172

Open manonthegithub opened 4 months ago

manonthegithub commented 4 months ago

For json-ld it could be jsonschema docs instead... then people can generate their own client very easily. Most of languaguages have json schema libs which allow to generate code and models based on the schema https://json-schema.org/learn/getting-started-step-by-step#create

JJ-Author commented 4 months ago

please rewrite and update the docs especiall this chapter https://dbpedia.gitbook.io/databus/model and this https://dbpedia.gitbook.io/databus/usage/api, it is out of date not well structured not well interlinked and sometimes really hard to read even for claus and simon and me it is not really helpful and we need to reverse-engineer the model from data on the databus. some suggestions for the rewrite

  1. at least one ER or uml class diagram that shows big picture - how all entities are connected (try 2 versions one with all information and one only showing necessary ids and "linking" properties and the rdf:type
  2. use a consistent running example throughout the entire documentation, including the autogenerated files, usage guides and especially swagger documentation
  3. use "tabs" for the different spec/code-block aspects (owl, shacl, json-ld context, example) for https://dbpedia.gitbook.io/databus/model/metadata, maybe add an example per property
  4. make headings for required properties and optional ones (and explain their autocompletion if any), insert a good textual description of the property (can be fetched from rdfs:comment of course)
  5. for every property have the full id without prefix somehow copyable/readable, make sure to emphasize difference in namespaces e.g. dcat:hasVersion vs. databus:hasVersion
  6. try to resolve the distribution/file/part terminology mess (distribution property links to a Part)
  7. try to interlink the individual pages and also documentation resources better. (e.g. from artifact property description link to artifact), better linking from swagger to docu and from docu to swagger, link from rdf via content negotiation to gitbook
  8. describe different API types linked data api (r); sparql api (r); databus restful api (r+w)
  9. try to switch perspective to developer and think of a goal you want to implement and then go through how you would like to find and lookup thing see e.g. story below that highlights the documentation and wording problems. don think the other way of documenting what you have.
  10. keep in mind and get inspired by other popular services like spotify-api e.g.
    spotify details

just an example here. what they do is pretty cool https://developer.spotify.com/documentation/web-api/reference/get-an-episode: they list api calls in an object oriented view in collapsible categories on the left

for one api call they

what they do bad: their is no quick overview on the root level members of the json object, i have to decollapse everything first. the example should be also collapsed by default because they use a lot of space

An example developers odyssee with current documentation

i think it helps to switch perspective when writing the technical docu. think of a developer who wants to quickly lookup the model (as reference during implementation) not a tutorial! we have 2.5/3 kinds of apis linked data api (r) sparql api (r) databus restful api (r+w)

all they share is the rdf data model so that should be documented and interlinked in a very good way.

assume i know a litte sparql and rdf and http/rest want to use some form of api, to get files of the latest version of 2 groups. when i go to API i see one image https://dbpedia.gitbook.io/databus/usage/api that suggests me to use swagger docu - no explanations of the other api options. also note that the swagger is neither backlinded to the model docu but has some examples which seem definetely not from the docu and are also partially incorrect (see e.g. this example https://dev.databus.dbpedia.org/api/#/Artifact/get-artifact) also note that autocompletion is described under webinterface although clearly relevant for the api when writing and reading and also when writing sparql queries.

ok coming back to my task files for 2 groups. i look at the api calls on swagger and get confused. group does not give me any information about the artifacts it contains, the artifact does not say which versions it contains, and then there is dataid (version) what the hack, where do i get files? ok damn - first that is getting many calls to get the data, but the second how the hell do i get the artifacts of a group and the versions of an artifact? i google for a potential sparql endpoint or found the sparql button on the dev databus. yes there we go (dbpedia DAU could confuse databus sparql with dbpedia sparql when using google)

ok i decide to do sparql queries instead. what do I do next. I need to write query patterns and need to know what is connected with what. (thats not easy since even if i already figured i can not go in a natural way from artifact to files (which are parts) or versions of it) i need to find the entrypoint version which is called dataid in the api?? just try to navigate through group->artifact->version->(distribution)->file (you could find out in an ER diagramm that is well drawn pretty easy) but if you look at the version docu and you must find out/know that version connects with distribution and artifact and group, and that hasVersion has nothing todo with datbus:version. and then then you need to learn that the distribution is actually the file resource description but is of type Part and has a file attribute which equals the id of the resource (what?) at this point i stop because you have realized much improvement is needed to make this situation understandable to users. also annoying and an error source is that nowhere in the docu there is the full id of the properties that you could copy easily into the sparql query e.g.

manonthegithub commented 2 months ago

The Diagram is really an essential thing for writing SPARQL queries something like this would be perfect: image