Where is the ontology? - Githubissues

dachafra commented 2 years ago

In the README is mentioned that you've developed an ontology to structure all the information, but I couldn't find it!

If the ontology (and the data) is published following web standards, it would be easier for other users to understand and re-use the data.

Thanks!

pepelefoul commented 2 years ago

@dachafra you can explore the ontology on #/data/relations.json5

dachafra commented 2 years ago

Thanks for the point @renemoreno!! But I was expecting something a bit more formal and following W3C recommendations so anyone could understand it and follow the documentation.

The ontology should be defined using OWL and be published in a human-friendly front (e.g., see examples such as the SSN ontology, the DCAT vocab, or these from a Spanish Project).

JaimeObregon commented 1 year ago

Sorry for being late in documenting this!

The whole data model, as it is consumed by the application, is located under the /httpdocs/resources folder. There the resources such as screenshots, documents, avatars, logos, and so on are published and available to the application. The ontology is located there, compiled and minified, under the form of a JavaScript module named ladonacion.js.

These files are compiled, validated, and optimized offline by the batch script found at /bin/validate.js. This offline process reads the contents in the /data directory and outputs the /httpdocs/resources directory ready for web production.

Therefore you should run node validate.js each time you make a significant change to the sources in /data. Before the first run you will need to install the dependencies required for this offline task, which are different from those used by the application:

$ cd bin && npm install

On the first execution, the script will, among other things, fetch and take screenshots and thumbnails for every website mentioned in the source files. That may take some time, but the results will be cached to make subsequent runs faster. The duration of this cache is controlled by config.cache.ttl the config file, and defaults to a year. The script will try to close all cookie consent banners and ads prior to taking the screenshot.

The data model under /data consists of several JSON5 files. They must comply with the schemas in /data/schemas. The script will validate each JSON5 file against each schema and run several other checks to prevent inconsistencies. The risk of inconsistencies was one of the challenges I faced in this project, as they are more likely as the model grows. You can write your own schemas to match the subjects of your own investigations.

Regarding the convenience of using web standards for the ontology and publishing it in a human-explorable frontend: would love to! But that was the other challenge I found. Everything in this project, as everything I do, is heavily rooted in open standards such as those from the W3C. But I couldn't find an approachable way to bring these standards to this project without a heavy engineering overload.

I wanted to keep the whole thing simple, and after some investigation I realized sticking to a known standard greatly increased the complexity of my workflow. I was ready to face it, had I found convenient tools to work the ontology in the way I envisioned for this particular project. But after some days of research, I found myself on a dead end and decided to build my own solution from scratch.

Also, the open vocabularies I found for corruption weren't a good match for the nature of this particular story and what I had in mind for this project.

I am not saying the existing standards and tooling are not mature and useful! It is only I couldn't find how to put them at work without a considerable investment in time and effort. They apparently don't serve well for my particular use case. I am by no means an expert in this field of ontologies — this project has been my excuse to explore and learn.

That said, this is now open source, and contributions evolving the current implementation will be more than welcome! :tada:

Thank you for your insights, @dachafra!

dachafra commented 1 year ago

Thanks @JaimeObregon for the long and detailed answer. I would love to contribute with my expertise in ontologies/knowledge graphs as I think it would be really valuable but at this moment it is impossible due to the amount of work I have. My two cents for next projects where you would have similar aims/requirements:

Industrial and agile methodology for ontology development: https://lot.linkeddata.es/
Construct an ontology is easier with Chowlk and its human-friendly approach: https://chowlk.linkeddata.es/
Publishing and managing the vocabulary can be done in two clicks with Ontooolgy + GitHub: http://ontoology.linkeddata.es/
Creating RDF from raw data can be done using RML mappings (https://rml.io/) and its best engine (https://morph-kgc.readthedocs.io/), and then can be queried with the in-memory engine Oxigraph (https://oxigraph.org/pyoxigraph/).

JaimeObregon / ladonacion

Where is the ontology? #3