edmcouncil / fibo

The Financial Industry Business Ontology (FIBO) defines the sets of things that are of interest in financial business applications and the ways that those things can relate to one another. In this way, FIBO can give meaning to any data (e.g., spreadsheets, relational databases, XML documents) that describe the business of finance.
https://spec.edmcouncil.org/fibo/
MIT License
331 stars 69 forks source link

Bad prefix definition #1161

Closed dallemang closed 4 years ago

dallemang commented 4 years ago

In fibo/DER/DerivativesContracts/DerivativesMasterAgreements.rdf, on line 10 (and later on line 33), there is a faulty prefix definition

<!ENTITY fibo-fnd-arr-arr "https://spec.edmcouncil.org/fibo/ontology/FND/Arrangements/Documents/">

Interestingly, the prefix is used in a way that the URIs are correct later on. But this creates a schizophrenic prefix, that causes problems downstream (a user detected one such downstream effect in fibo-v).

The fix is pretty easy, but you can't just change this line, you have to change all the lines in that file that reference it; so it is better to do this (at least to get started) with a tool

On another note; I'd like to alert @mereolog to think about how we would expand the hygiene testing to catch things like this. Let's start by discussing that here in this issue; we'll call a meeting if we have to.

dallemang commented 4 years ago

@kptyson @ElisaKendall

ghost commented 4 years ago

The attached workbook is the output of an experiment I conducted using the vocabulary and the ontology. My presumption was that as each vocabulary concept has a corresponding ontology entity to which it is linked via rdfs:isDefinedBy, for each superclass of the entity, there should be a corresponding vocabulary concept and a path from the original concept to the concepts that correspond to each of the superclasses. The worksheet named Shortest Path contains, yes you guessed it, the shortest path between the concepts. The worksheet named No Path Found lists the concepts pairs between which I could not identify. The calculation is done using the networkx implementation of Dijkstra's algorithm.

related-skos-concepts.xlsx

ghost commented 4 years ago

Should I raise a separate issue for the No Paths Found problem?

dallemang commented 4 years ago

The first issue, that there are some entities that have broader terms in fibo-v that don't have matching subclasses on the fibo side (through isdefinedby) was caused (to a large extent) by the fact that some of the isdefinedby links pointed to non-existant resources on the fibo side, due to a namespace declaration problem (that's what this issue is about).

The ones in your spreadsheet seem to be caused by a different fault, that is, fibo-v does not include conversions of resources that are not defined by FIBO. If you open fibo itself (e.g., the dev-quickstart file), then you'll see subClassOf paths from, e.g., UniqueSwapIdentifier to Reference. But these go through non-fibo resources (in this case, lcc-lr:identifier). In the meetings where we defined fibo-v, we explicitly decided not to include non-fibo resources, but that was before fibo had such a strong dependency on certain externals such as lcc; we might want to re-visit that decision.

Just to check - I ran a SPARQL query to identify the missing links; there are three of them, and all of them are in lcc:

lcc-lr:Identifier lcc-lr:IdentificationScheme lcc-lr:CodeSet

So, since this turned out to be two issues (one was the bad namespace declaration, the other is the decision to leave out lcc) we should make a new issue, and discuss whether we should include lcc models (probably not lcc entities) in the fibo-v translation.