explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.85k stars 4.38k forks source link

Deploying to Google Cloud - best practice #4617

Closed NixBiks closed 4 years ago

NixBiks commented 4 years ago

This might not be the right place to state this question OR it might be.

I'm wondering if anyone have had any experience on deploying/managing their spaCy models in Google Cloud Platform? E.g. it might be possible to use their AI Platform and list spaCy as a dependency when packaging the model. However it isn't clear to me if its actually feasible so maybe someone already knows?

FYI; we are using spaCy on a stream of realtime news.

honnibal commented 4 years ago

It might be easier to load the models from disk. My specific strategy on GCP is actually to use GCSFuse, and have a bucket with all of the models unpacked. This way I can load them from the mount directory, and they're loaded in from the bucket. This saves me from baking the models into the code artifact, which is a hassle if you don't know which models you'll need ahead of time.

There's nothing wrong with building things into a virtualenv instead, but that might make your deployment a bit harder. Docker is also a good solution, although putting all the models inside the container might make the container bigger than you want it.

NixBiks commented 4 years ago

I just want to make sure that I understand correctly. Do you propose something like this?

  1. You have some compute engine, e.g. a VM machine.
  2. You mount a bucket with all your models onto the compute engine using GCSFuse.
  3. You expose a predict REST call using fastapi or something like that.

I.e. you manually scale the size of your compute engine(s) and if you want to update a model then you upload an unpacked model to the bucket and update your compute engine to use that and restarts the service?

Also by unpacked you mean the output of nlp.to_disk("/path"), yes?

honnibal commented 4 years ago

You expose a predict REST call using fastapi or something like that.

Personally I don't quite do it that way. I use a job scheduler (specifically, Nomad -- but you could use anything) and trigger a command-line program that reads a file of text from the bucket, and writes a file of parses to the bucket. There's nothing wrong with using an API instead though. The script is also given a path to the unpacked model.

I place the pex file, the execution script, the model and the input file all in the mounted bucket, so the command just references those paths. Like, I'm executing a whole bunch of commands like this:

/mnt/gcs/mybucket/pex/spacy.pex \ # This is like running 'python'
  /mnt/gcs/mybucket/scripts/parse_file.py \ # The script being run
  /mnt/gcs/mybucket/models/en_core_web_sm \
  /mnt/gcs/mybucket/inputs/data-2015-01-01.gz \
  /mnt/gcs/mybucket/outputs/data-2015-01-01.spacy

So the only thing the job scheduler has to do is find a machine to execute that on. There's no transfer of resources to the worker nodes, and the worker nodes don't communicate between themselves. If the job fails, the scheduler just has to reschedule that work again.

There's nothing wrong with using a REST API though. I just find that questions around connection timeouts and stuff are all incidental complexity. Nothing needs to be returned back to the caller, so I'd rather not frame it as a REST call.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.