alephdata / memorious

Lightweight web scraping toolkit for documents and structured data.
https://docs.alephdata.org/developers/memorious
MIT License
310 stars 59 forks source link

Update docs for `aleph_emit_document` operation #232

Open tillprochaska opened 1 year ago

tillprochaska commented 1 year ago

The list of available metadata properties in the docs seems to be slightly outdated.

I’ve skimmed the code in Memorious, alephclient, and the Aleph API and this should be the latest list of all allowed props: https://github.com/alephdata/aleph/blob/develop/aleph/validation/schema/ingest.yml.

Might be worth only mentioning a few examples of allowed metadata properties in the docs and to link to the YAML file from the Aleph repo so we do not have to update the list in the Memorious docs.

tillprochaska commented 1 year ago

I missed a detail: While the metadata items in the file above are what Aleph accepts, the items you can pass from the aleph_emit op in Memorious are actually limited by these two methods: https://github.com/alephdata/memorious/blob/main/memorious/operations/aleph.py#L14-L47

As far as I can see, items supported by the Aleph Ingest API but not in Memorious are:

It’s all a bit confusing to be honest, because these metadata items also do not always map 1:1 to FtM properties, I guess that is a relict of the fact that documents have been a separate concept (and not entities) in the first versions of Aleph.