galterlibrary / InvenioRDM-at-NU

Next generation repository for health science
MIT License
9 stars 0 forks source link

Automated metadata extraction #102

Open saragon02 opened 6 years ago

saragon02 commented 6 years ago

As a record depositor, I want metadata for my resource's page to be automatically extracted from the resource I'm uploading so I don't have to manually fill in all the metadata fields.

I also want a reminder to approve my record's data so I remember to check that the fields loaded correctly.

fenekku commented 6 years ago

What I had in mind with respect to the second point is to have the form fields be injected with recommendations based on the uploaded data. These recommendations are dismissible (opt-out-able) or are shown as hints (opt-in-able).

I have tagged this as a wishlist as well because it might be very complex to achieve.

phebal commented 6 years ago

PDF-only extractor: https://github.com/kermitt2/grobid

Metadata & text extractor for any document format: https://tika.apache.org/

fenekku commented 5 years ago

FAST recommender: http://experimental.worldcat.org/fast/assignfast/