CogStack / MedCATservice

Running MedCAT as a RESTful web service
17 stars 14 forks source link

spaCy model configuration #19

Closed sandertan closed 2 years ago

sandertan commented 2 years ago

Hi thanks for updating this repo to use the latest MedCAT. We've been able to use it with Dutch MedCAT & spaCy models in our fork.

I was wondering whether you are open to changing how spaCy models are handled. Currently:

https://github.com/CogStack/MedCATservice/blob/026d8acf0d8c13614ec67a1cef144431e60939ec/medcat_service/nlp_processor/medcat_processor.py#L193

It would be nice to use the spaCy model defined in the CDB and download this model if it is not available on the system, and/or make the Dockerfile spaCy models configurable, perhaps with Docker ARG?

Also, since the recent MedCAT releases have the parameters in the CDB, the envs/env_medcat* files can be removed right?

w-is-h commented 2 years ago

I think long-term the service should switch to model-packs, they internally contain the spacy models so this will not be necessary, let me know what you think.

vladd-bit commented 2 years ago

I agree with this, will make sure model packs will be part of the the next release. Will edit to use the spacy model defined in CDB for now also.

sandertan commented 2 years ago

With model-pack you mean a bundle of the CDB, Vocab, spaCy model and parameters? That sounds awesome.

w-is-h commented 2 years ago

Yes, plus meta annotation models (see here)

vladd-bit commented 2 years ago

The model pack stuff is now added, check the env_app file for the var. At the moment it is possible to add more meta_cats on top of the ones provided in the model pack, not sure if this is ok or not (or if its fully useful). And the other thing requested with the CDB spec has been also changed some time ago, so it should work as intended now.

sandertan commented 2 years ago

Nice! I'll check it out in our next deployment.