alephdata / followthemoney

Data model and processing tools for investigative entity data
https://followthemoney.tech
MIT License
211 stars 50 forks source link

Proposal: namespaces/external entity definitions. #718

Closed dchaplinsky closed 1 year ago

dchaplinsky commented 2 years ago

Foreword

Basic FTM ontology is nice (despite carrying some legacy fields) but not always does it fit the task. Great success

In the case you want to extend the basic ontology you have following options:

First two methods doesn't really fit some tasks, as long as you probably want to have the backward compatibility to reuse the data that aleph, opensanctions and other sources provide you for free.

Problem

In case when you are extending the ontology you have to have the way to distinguish original entities from the one you've added. For example, to be able to export in the original format by finding each parent that came from original ontology. Also you need a clear cue to show what was added to it.

Proposed solution

This can be solved with a new base field, for example, a namespace, where all the original entity types will have namespace=base or something. In this case the developer will be able not only to extend the original ontology but also to re-use ontologies published by others (again, with regional/task-specific flavor, maybe committed to contrib part of this repo).

@pudo also proposed to extend this and allow to specify the public url of the entity type. In this case Aleph will be able to deal with the data that was mapped to an extended ontology, loading/caching/maintaining the list of external definitions, as long as they are derived from unaltered original ontology.

ozhyrenkov commented 2 years ago

I have few thoughts on this:

Also I've found we have already introduced namespace in rdf.py

It's of course a slightly different thing, but I'm not sure should we add ambiguity of this kind in terms of naming.