Closed NixBiks closed 4 years ago
It might be easier to load the models from disk. My specific strategy on GCP is actually to use GCSFuse, and have a bucket with all of the models unpacked. This way I can load them from the mount directory, and they're loaded in from the bucket. This saves me from baking the models into the code artifact, which is a hassle if you don't know which models you'll need ahead of time.
There's nothing wrong with building things into a virtualenv instead, but that might make your deployment a bit harder. Docker is also a good solution, although putting all the models inside the container might make the container bigger than you want it.
I just want to make sure that I understand correctly. Do you propose something like this?
I.e. you manually scale the size of your compute engine(s) and if you want to update a model then you upload an unpacked model to the bucket and update your compute engine to use that and restarts the service?
Also by unpacked you mean the output of nlp.to_disk("/path")
, yes?
You expose a predict REST call using fastapi or something like that.
Personally I don't quite do it that way. I use a job scheduler (specifically, Nomad -- but you could use anything) and trigger a command-line program that reads a file of text from the bucket, and writes a file of parses to the bucket. There's nothing wrong with using an API instead though. The script is also given a path to the unpacked model.
I place the pex file, the execution script, the model and the input file all in the mounted bucket, so the command just references those paths. Like, I'm executing a whole bunch of commands like this:
/mnt/gcs/mybucket/pex/spacy.pex \ # This is like running 'python'
/mnt/gcs/mybucket/scripts/parse_file.py \ # The script being run
/mnt/gcs/mybucket/models/en_core_web_sm \
/mnt/gcs/mybucket/inputs/data-2015-01-01.gz \
/mnt/gcs/mybucket/outputs/data-2015-01-01.spacy
So the only thing the job scheduler has to do is find a machine to execute that on. There's no transfer of resources to the worker nodes, and the worker nodes don't communicate between themselves. If the job fails, the scheduler just has to reschedule that work again.
There's nothing wrong with using a REST API though. I just find that questions around connection timeouts and stuff are all incidental complexity. Nothing needs to be returned back to the caller, so I'd rather not frame it as a REST call.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
This might not be the right place to state this question OR it might be.
I'm wondering if anyone have had any experience on deploying/managing their spaCy models in Google Cloud Platform? E.g. it might be possible to use their AI Platform and list spaCy as a dependency when packaging the model. However it isn't clear to me if its actually feasible so maybe someone already knows?
FYI; we are using spaCy on a stream of realtime news.