MLH-Fellowship / Auto-Tagger

To automatically tag people in various chat groups
Apache License 2.0
4 stars 3 forks source link

503 Error after deploying on Heroku #3

Closed pncnmnp closed 4 years ago

pncnmnp commented 4 years ago

The deployment on Heroku was a bumpy ride. Initially I encountered the following error:

ERROR: Could not find a version that satisfies the requirement torchvision==0.7.0+cpu (from -r ./requirements.txt (line 2)) (from versions: 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.1, 0.2.2, 0.2.2.post2, 0.2.2.post3, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.5.0, 0.6.0, 0.6.1, 0.7.0) ERROR: No matching distribution found for torchvision==0.7.0+cpu (from -r ./requirements.txt (line 2)) The command '/bin/sh -c if [ -f /bento/bentoml-init.sh ]; then bash -c /bento/bentoml-init.sh; fi' returned a non-zero code: 1 ▸ Error: docker build exited with Error: 1

I noticed a similar error for Torch==1.6.0+cpu. As I was not using CUDA (#2), The following thread on Stack Overflow - Install PyTorch from requirements.txt fixed the issue.

At this stage BentoML's requirements.txt for my latest model was in the following form:

bentoml==0.9.0 --find-links https://download.pytorch.org/whl/torch_stable.html torch==1.6.0+cpu numpy==1.19.2 --find-links https://download.pytorch.org/whl/torch_stable.html torchvision==0.7.0+cpu scikit-learn==0.22

The deployment went fine without any issues. However, after making a prediction request:

curl -i --header "Content-Type: application/json" --request POST --data '{"sentence": "John Lennon used to play for The Beatles"}' https://bentoml-her0ku-mtywmteynzm1mgo.herokuapp.com/predict

I was getting a 503 Error. From Heroku's Log: 2020-09-26T15:37:11.745901+00:00 heroku[router]: at=error code=H10 desc="App crashed" method=POST path="/predict" host=bentoml-her0ku-mtywmteynzm1mgo.herokuapp.com request_id=a971ef08-e573-487a-9e23-2f0f5ba0e87a fwd="49.32.63.210" dyno= connect= service= status=503 bytes= protocol=https

Heroku Docs says the following about H10 error - A crashed web dyno or a boot timeout on the web dyno will present this error.

Inspecting the log file, I found the issue:

2020-09-26T15:49:05.839688+00:00 app[web.1]: spec.loader.exec_module(module) 2020-09-26T15:49:05.839689+00:00 app[web.1]: File "", line 678, in exec_module 2020-09-26T15:49:05.839689+00:00 app[web.1]: File "", line 219, in _call_with_frames_removed 2020-09-26T15:49:05.839690+00:00 app[web.1]: File "/bento/PyTorchModel/serving.py", line 14, in 2020-09-26T15:49:05.839690+00:00 app[web.1]: from config import config 2020-09-26T15:49:05.839690+00:00 app[web.1]: File "/bento/PyTorchModel/config.py", line 1, in 2020-09-26T15:49:05.839691+00:00 app[web.1]: import transformers 2020-09-26T15:49:05.839691+00:00 app[web.1]: ModuleNotFoundError: No module named 'transformers' 2020-09-26T15:49:05.839797+00:00 app[web.1]: [2020-09-26 15:49:05 +0000] [17] [INFO] Worker exiting (pid: 17) 2020-09-26T15:49:05.868537+00:00 app[web.1]: [2020-09-26 15:49:05 +0000] [18] [ERROR] Exception in worker process 2020-09-26T15:49:05.868544+00:00 app[web.1]: Traceback (most recent call last):

Transformers module was not installed. I believe this is due to the following line-
@bentoml.env(pip_packages=['torch', 'numpy', 'torchvision', 'scikit-learn'])

BentoML is being explicitly asked to install the following four packages and not Transformers.

@bentoml.env(infer_pip_packages=True) should fix the issue.

pncnmnp commented 4 years ago

Changing this line (in serving.py) to - @bentoml.env(infer_pip_packages=True) did fix the dependency issue. However, the 503 error still persists.

Looking again into Heroku's log files, I found a new issue:

2020-09-27T07:30:27.520399+00:00 app[web.1]: File "/bento/PyTorchModel/serving.py", line 14, in 2020-09-27T07:30:27.520399+00:00 app[web.1]: from config import config 2020-09-27T07:30:27.520400+00:00 app[web.1]: File "/bento/PyTorchModel/config.py", line 3, in 2020-09-27T07:30:27.520400+00:00 app[web.1]: class config: 2020-09-27T07:30:27.520401+00:00 app[web.1]: File "/bento/PyTorchModel/config.py", line 13, in config 2020-09-27T07:30:27.520401+00:00 app[web.1]: do_lower_case=True 2020-09-27T07:30:27.520401+00:00 app[web.1]: File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1425, in from_pretrained 2020-09-27T07:30:27.520402+00:00 app[web.1]: return cls._from_pretrained(*inputs, **kwargs) 2020-09-27T07:30:27.520402+00:00 app[web.1]: File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1531, in _from_pretrained 2020-09-27T07:30:27.520403+00:00 app[web.1]: list(cls.vocab_files_names.values()), 2020-09-27T07:30:27.520404+00:00 app[web.1]: OSError: Model name '../model/' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, TurkuNLP/bert-base-finnish-cased-v1, TurkuNLP/bert-base-finnish-uncased-v1, wietsedv/bert-base-dutch-cased). We assumed '../model/' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

The issue is the config.py file and Heroku is unable to find the ../model directory and its contents. Checking the local copy of my BentoML model which got deployed as a docker image on Heroku, I can see the following file structure:

[13:06][20200926223007_B35B21] tree . . ├── bentoml-init.sh ├── bentoml.yml ├── docker-entrypoint.sh ├── Dockerfile ├── environment.yml ├── MANIFEST.in ├── python_version ├── PyTorchModel │   ├── artifacts │   │   ├── init.py │   │   └── ner.pt │   ├── bentoml.yml │   ├── config.py │   ├── dataset.py │   ├── init.py │   ├── model.py │   ├── serving.py │   └── utils.py ├── README.md ├── requirements.txt └── setup.py 2 directories, 19 files [13:06][20200926223007_B35B21]

It has not loaded the model directory and its 3 contents - config.json, pytorch_model.bin, and vocab.txt, although it was there on my local repository (which was saved by BentoML). I am unsure if we have to change the file structure or provide a special provision on BentoML.

pncnmnp commented 4 years ago

The above error was fixed by moving the model directory to ./src/ and renaming it to bert-base-uncased. The final path was /server/src/bert-base-uncased instead of /server/model/. (Also change the path name in config.py) I am not sure why it worked. I assumed it was looking for a directory from a word-list and I provided it a name relevant to our model from the same list.

However, fixing this caused another issue -

Traceback (most recent call last): File "/opt/conda/lib/python3.6/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker worker.init_process() File "/opt/conda/lib/python3.6/site-packages/gunicorn/workers/base.py", line 119, in init_process self.load_wsgi() File "/opt/conda/lib/python3.6/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi self.wsgi = self.app.wsgi() File "/opt/conda/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/opt/conda/lib/python3.6/site-packages/bentoml/server/gunicorn_server.py", line 94, in load bento_service = load(self.bento_service_bundle_path) File "/opt/conda/lib/python3.6/site-packages/bentoml/saved_bundle/loader.py", line 251, in load svc_cls = load_bento_service_class(bundle_path) File "/opt/conda/lib/python3.6/site-packages/bentoml/saved_bundle/loader.py", line 191, in load_bento_service_class spec.loader.exec_module(module) File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/bento/PyTorchModel/serving.py", line 22, in meta_data = joblib.load("meta.bin") File "/opt/conda/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 597, in load with open(filename, 'rb') as f: FileNotFoundError: [Errno 2] No such file or directory: 'meta.bin'

I looked into the generated Bento (latest) directory and sure enough, there was no such file as meta.bin, which is strange considering it was there in the codebase (./server/src/meta.bin). Adding it in the latest Bento, fixed the above issue.

However, a last error popped out: (Now we are starting to go down a rabbit hole :smile:)

Traceback (most recent call last): File "/opt/conda/lib/python3.6/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker worker.init_process() File "/opt/conda/lib/python3.6/site-packages/gunicorn/workers/base.py", line 119, in init_process self.load_wsgi() File "/opt/conda/lib/python3.6/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi self.wsgi = self.app.wsgi() File "/opt/conda/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/opt/conda/lib/python3.6/site-packages/bentoml/server/gunicorn_server.py", line 94, in load bento_service = load(self.bento_service_bundle_path) File "/opt/conda/lib/python3.6/site-packages/bentoml/saved_bundle/loader.py", line 251, in load svc_cls = load_bento_service_class(bundle_path) File "/opt/conda/lib/python3.6/site-packages/bentoml/saved_bundle/loader.py", line 191, in load_bento_service_class spec.loader.exec_module(module) File "", line 678, in exec_module File "", line 219, in _call_with_frames_removed File "/bento/PyTorchModel/serving.py", line 22, in meta_data = joblib.load("meta.bin") File "/opt/conda/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 605, in load obj = _unpickle(fobj, filename, mmap_mode) File "/opt/conda/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 529, in _unpickle obj = unpickler.load() File "/opt/conda/lib/python3.6/pickle.py", line 1050, in load dispatch[key[0]] (self) File "/opt/conda/lib/python3.6/pickle.py", line 1338, in load_global klass = self.find_class(module, name) File "/opt/conda/lib/python3.6/pickle.py", line 1388, in find_class import(module, level=0) ModuleNotFoundError: No module named 'sklearn' [2020-09-27 14:56:28 +0000] [12] [INFO] Worker exiting (pid: 12) [2020-09-27 14:56:30 +0000] [1] [INFO] Shutting down: Master

After adding scikit-learn==0.22 to the requirements.txt file, it fixed this issue. Thankfully, no other issue appeared and the docker image was working fine on my local machine.

Finally, no 503s!