alephdata / ingest-file

Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
GNU Affero General Public License v3.0
54 stars 26 forks source link

Create BankAccount entitied from valid IBANs #503

Open catileptic opened 1 year ago

catileptic commented 1 year ago

TODO:

As per #415 and #2066, this is an attempt to create BankAccount FTM entities out of valid IBANs.

In the analysis stage, an IBAN is identified by the existing regex. It is added to the list of Mentions.

Then, the IBANs that have been collected as Mentions are validated using schwifty. openiban was also considered, but it performs worse than schwifty. I've listed some test cases below.

BankAccount entities are created for each valid IBAN, and the IBAN string is added to the iban FTM attribute.

When running Aleph locally, after I ingested a test document (attached here), the BankAccount FTM entities appear in FTM-store, but they don't show up in the Aleph UI. This should be investigated further.

Workaround: re-index the investigation containing the IBANs document. Then, go to Entities (in the sidebar) > Add a new entity type > Bank Account. The newly created entities will appear in the table. They do not appear in the sidebar, though.

IBANs.pdf

(iban) ➜  ingest-file git:(feature/iban) ✗ python test_validator.py
V [schwifty] (0) GB33BUKB20201555555555
V [schwifty] (1) GB94BARC10201530093459
V [schwifty] (2) GB94BARC20201530093459
V [schwifty] (3) GB96BARC202015300934591
X [schwifty] (4) GB02BARC20201530093451
X [schwifty] (5) GB68CITI18500483515538
X [schwifty] (6) GB24BARC20201630093459
V [schwifty] (7) GB12BARC20201530093A59
V [schwifty] (8) GB78BARCO0201530093459
V [schwifty] (9) GB2LABBY09012857201707
V [schwifty] (10) GB01BARC20714583608387
V [schwifty] (11) GB00HLFX11016111455365
V [schwifty] (12) US64SVBKUS6S3300958879
V [schwifty] (13) NL63INGB5198491756
V [schwifty] (14) RO65PORL4435312861931963
V [schwifty] (15) SA7439228561548156293899
V [schwifty] (16) AE560335386651248739596
V [schwifty] (17) ES7401283747341413374686

schwifty got 3 / 18 IBANs wrong
V [openiban] (0) GB33BUKB20201555555555
V [openiban] (1) GB94BARC10201530093459
V [openiban] (2) GB94BARC20201530093459
V [openiban] (3) GB96BARC202015300934591
X [openiban] (4) GB02BARC20201530093451
X [openiban] (5) GB68CITI18500483515538
X [openiban] (6) GB24BARC20201630093459
X [openiban] (7) GB12BARC20201530093A59
X [openiban] (8) GB78BARCO0201530093459
V [openiban] (9) GB2LABBY09012857201707
X [openiban] (10) GB01BARC20714583608387
X [openiban] (11) GB00HLFX11016111455365
X [openiban] (12) US64SVBKUS6S3300958879
V [openiban] (13) NL63INGB5198491756
V [openiban] (14) RO65PORL4435312861931963
V [openiban] (15) SA7439228561548156293899
V [openiban] (16) AE560335386651248739596
V [openiban] (17) ES7401283747341413374686

openiban got 8 / 18 IBANs wrong