alephdata / followthemoney

Data model and processing tools for investigative entity data
https://followthemoney.tech
MIT License
211 stars 50 forks source link

Max typed value length metadata #1516

Closed pudo closed 1 month ago

pudo commented 1 month ago

This doesn't actually do anything (i.e. stop add), it just creates some benchmark values.

pudo commented 1 month ago

We have a bunch of customers who are trying to squeeze our data into SQL tables and so we need to do this enforcement on our end :|

https://github.com/opensanctions/opensanctions/issues/1045

tillprochaska commented 1 month ago

@pudo Do you want to (in the future) enforce the max size of individual values in FtM python lib and this is a first step in that direction? Or will the logic that checks that live in OS? (Asking so I know whether I should check typical value sizes in our Aleph instance.)

pudo commented 1 month ago

We're playing with adding it to OS as a data quality check: it turns out that when you have a really long name, for example, it's usually a parser error. It's helping us find a ton of bugs. So while I have no personal ambition to push it on you, I feel like it could be a good thing to use in places (e.g. for user-contributed entities, and maybe even in the UI for web-based mappings?)

arp242 commented 1 month ago

The way I read it from the doc comment, it's more advisory than anything else: you may support longer values. If anything, the "max_size" is more the "minimum size your database should support" than "maximum size this can ever be".

pudo commented 1 month ago

Yeah, absolutely! We're going to probably make it "max length you have to expect from an OpenSanctions dataset" in a few weeks, but that's just one citizen in the ecosystem :)

tillprochaska commented 1 month ago

Thanks for the clarification! :)