esmero / archipelago-deployment

Archipelago Commons Docker Deployment Repository
33 stars 16 forks source link

Add needed infrastructure for NLP/ENTITY extraction in our docker-compose #62

Open DiegoPino opened 4 years ago

DiegoPino commented 4 years ago

What flavor of ice cream is AI?

For Natural Language Processing and AI analysis of extracted Corpus of text from Files, metadata Description fields or similar textual bodies i started building a few Search API Solr Post Processors that can deal with this, all under the umbrella of the Strawberryfield Runners module for 1.0.0. PS: Ping me for a showcase of the proof of concept i have running in AWS

But to make this an option for people testing-using or just admiring archipelago (yeah.. not sure how many) i want to dockerize these needs through an extra additional docker-compose.yml that can be appended via the -f option when doing and docker-compose up -d

What i need?

This

The idea here is basically not everyone needs fancy robots and AI getting names and links from their metadata, but some could need. I'm working on formalizing (and generalizing) the life code i have. I like the fact that it integrates good into the SBF runners idea but also forced me to make some changes there, like how/when things run and i found myself quite happy seeing that actually a lot can go directly to Solr instead of being saved on metadata but other things need to go into permanent storage.

Not the place (i know) but this also open a chance of working again on my idea that AI/NPL generated data needs to be tagged/classified and exposed to the world as such. Means i'm adding provenance, how it was processed, version of the software, etc to an upper structure of the JSON. That way we know and we can keep a separation of human generated classification v/s machine one. Machines are not always quite aware of context so the level of trust we can put on this data will vary a lot, so good we can pass that back to the end user.

@alliomeria because of metadata discussion and @giancarlobi since i think you will like this. Should be able to add code to https://github.com/esmero/strawberry_runners/tree/ISSUE-4 soon with the full refactor and the new plugins.I do have high hopes here!

DiegoPino commented 4 years ago

Gosh, i'm just so dyslexic. NLP not NPL 🤦