alephdata / followthemoney

Data model and processing tools for investigative entity data
https://followthemoney.tech
MIT License
218 stars 53 forks source link

fix: add name to Contract #1558

Closed bamthomas closed 3 weeks ago

bamthomas commented 3 weeks ago

Hello,

We are using the aleph FtM models to generate Java classes that we are going to integrate into datashare. When the code is generated, there are compilation issues.

I can't see any ways of finding name for a License object. How could I find the description of this property?

I noticed also that CallForTenders.title has no title is it on purpose?

Finally some fields are required but not listed in properties, is it a wanted behaviour?

Thank you for your answers.

tillprochaska commented 3 weeks ago

Hey @bamthomas!

I can't see any ways of finding name for a License object. How could I find the description of this property?

Contract inherits properties from Thing (via Asset) which defines the name property.

Screenshot 2024-10-22 at 5 10 49 PM

Here’s the inheritance logic in the Python library: https://github.com/alephdata/followthemoney/blob/main/followthemoney/schema.py#L214-L231

I noticed also that CallForTenders.title has no title is it on purpose?

Not sure if I’m misunderstanding something, but the title property is defined here: https://github.com/alephdata/followthemoney/blob/main/followthemoney/schema/CallForTenders.yaml#L24

Finally some fields are required but not listed in properties, is it a wanted behaviour?

Do you have an example for that? My best guess would be that the fields are defined by one of the parent schemas.

bamthomas commented 3 weeks ago

Ok thank you for your quick answer, for now I had made an approximation that there were not several models inherited at the same time (because Java cannot have multiple inheritance). I have to find another way with interfaces maybe.

bamthomas commented 3 weeks ago

A quick question: for now we are using the required fields to find a parent for Java, to find a unique parent.

It works well because most of the multiple inheritance is ending with diamond pattern, so if we keep a set of inherited attributes, there are no conflicts, and the diamond ends (Document, Thing) with "real" (Class in a sense of Java) entities.

It seems that there is a kind of underneath single inheritance path for each entity. Would that make sense to make it explicit?

tillprochaska commented 3 weeks ago

Sorry, I’m not 100% sure about this, maybe @pudo has an answer.

pudo commented 3 weeks ago

Hey @bamthomas this is really exciting work to see, wow. Are you sure you want to 1:1 project the FtM schema onto a Java class hierarchy? In Python we're essentially keeping the values as a map and then wrapping them in some accessor getter/setter logic, which avoids creating massive sparse objects (a given Person will have a median of 3-4 props set, and 50 defined) and also allows for legacy or as-yet-unknown props to surf through on the side (e.g. if your version of FtM is older than what the data producer used).

Regarding the "underneath single path": you mean that there's usually one root, like Thing or Interval? We do have I think some weirdos that are both (emails? contracts?). It could be possible to add markers for some schema that are essentially Mixins, like Asset and Value - things that don't mean anything, without being part of a bigger whole.

bamthomas commented 3 weeks ago

@pudo nice to see you here and yes we are also excited about FtM models and helping developing a semantic standard for other actors to be able to retrieve easily the data and use it in other tools for example.

For the 1:1 java class hierarchy it was my first approach because:

} class Article class Assessment class Asset class Associate class Audio class BankAccount class Call { <>

} class CallForTenders class Company class Contract class ContractAward class CourtCase class CourtCaseParty class CryptoWallet class Debt class Directorship class Document class Documentation class EconomicActivity { <>

} class Email class Employment class Event class Family class Folder class HyperText class Identification class Image class Interest { <>

} class Interval { <>

} class LegalEntity class License class Membership class Mention class Message class Note class Occupancy class Organization class Ownership class Package class Page { <>

} class Pages class Passport class Payment class Person class PlainText class Position class Post class Project class ProjectParticipant { <>

} class PublicBody class RealEstate class Representation class Sanction class Security class Similar { <>

} class Succession class Table class TaxRoll class Thing class Trip class UnknownLink class UserAccount class Value { <>

} class Vehicle class Vessel class Video class Workbook

Address --> Thing Airplane --> Thing Airplane ..> Vehicle Article --> Document Assessment --> Thing Asset --> Thing Asset ..> Value Associate ..> Interval Audio --> Document BankAccount ..> Asset BankAccount --> Thing CallForTenders ..> Interval CallForTenders --> Thing Company ..> Asset Company --> Organization Contract ..> Asset Contract --> Thing ContractAward ..> Interest ContractAward ..> Value CourtCase --> Thing CourtCaseParty ..> Interest CryptoWallet --> Thing CryptoWallet ..> Value Debt ..> Interval Debt ..> Value Directorship ..> Interest Document ..> Analyzable Document --> Thing Documentation ..> Interest Email --> Document Email ..> Folder Email ..> HyperText Email ..> PlainText Employment ..> Interest Event ..> Analyzable Event ..> Interval Event --> Thing Family ..> Interval Folder --> Document HyperText --> Document Identification ..> Interval Image --> Document LegalEntity --> Thing License --> Contract Membership ..> Interest Message --> Document Message ..> Folder Message ..> HyperText Message ..> Interval Message ..> PlainText Note ..> Analyzable Note --> Thing Occupancy ..> Interval Organization --> LegalEntity Ownership ..> Interest Package --> Document Package ..> Folder Pages --> Document Passport --> Identification Payment ..> Interval Payment ..> Value Person --> LegalEntity PlainText --> Document Position --> Thing Post ..> Interest Project ..> Interval Project --> Thing Project ..> Value PublicBody --> Organization RealEstate ..> Asset RealEstate --> Thing Representation ..> Interest Sanction ..> Interval Security ..> Asset Security --> Thing Succession ..> Interest Table --> Document TaxRoll ..> Interval Trip --> Event UnknownLink ..> Interest UserAccount --> Thing Vehicle ..> Asset Vehicle --> Thing Vessel --> Thing Vessel ..> Vehicle Video --> Document Workbook --> Document Workbook ..> Folder



I'd be happy to have your insights.
(and looking at the diagram it seems there are some weird orphans like `Pages` or others that should be linked to `Document`. there are bugs :grin: )

 ahem, "fixing" the `Document` inheritance broke `Vehicle`, `Vessel`, `Package`, `Security`, `Contract`, `Company`, `Message`, `Airplane`, `Workbook`, `BankAccount`, `Email`, `RealEstate`.

 Definitely having some mixins markers could help.

 After having dug into different cases:

 - `RealEstate`, `BankAccount`, `Workbook`,  `Package`, `Airplane`,  `Contract`,  `Security`, `Vehicle`, `Vessel` are classic inheritance chain (just have to pick the right level, this is kind of a bug in the java lib)
 - `Company` extends `Organization` and `Asset` this one is easy because Asset has a mandatory name coming from `Thing` and `Organization` is also "a kind of" `Thing` so an asset is a mixin/trait/interface
 - case of `Message` and `Email` that extends `Folder`, `PlainText`, `HyperText` that are extending `Document` -> a diamond. These ones are harder

 **EDIT** fixed diagram after code fix.
 **EDIT2** models broken
 **EDIT3**  digging into compilation errors #
bamthomas commented 2 weeks ago

So we ended up with defining a mixin that helps us to generate code:

static final Set<String> mixins = new LinkedHashSet<>(List.of("Asset", "Folder", "PlainText", "HyperText"));

see Model.java.

There are 2 "structural" questions on Document (that will be the first entity to be implemented in datashare):

We'd be happy to talk about this, and will keep you posted on the next steps (this is a beginning with only required fields).