alephdata / ingest-file

Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
GNU Affero General Public License v3.0
54 stars 26 forks source link

Extract crypto wallet addresses #416

Open tillprochaska opened 1 year ago

tillprochaska commented 1 year ago

ingest-file could extract crypto wallet addresses for popular crypto currencies using regular expressions, similar to it already extracts email addresses and IBANs.

While ElasticSearch and Aleph do support searching using regexes which can be used to find mentions of such addresses, ElasticSearch’s regex capabilities are limited, e.g. a regex must always match a full token. It can be difficult or impossible to come up with a valid ES regex that matches valid addresses and is precise at the same time.

Rosencrantz commented 1 year ago

@pudo I have vague memories of chatting about this with you. Is there a possibility of a clash here, too many false negatives, that sort of thing?