ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
224 stars 148 forks source link

🐛 Bug: Prediction with eos2a9n model repeatedly fails with "status 500" error codes logged out #342

Closed Cee-tech21 closed 2 years ago

Cee-tech21 commented 2 years ago

Describe the bug.

Attempted prediction using eos2a9n model seems to fail and enter into an endless loop. "Status 500" error code is repeatedly registered in the process log. Prediction was tried twice and on both occasions, prediction had to be terminated manually after several minutes of observing the loop.

Describe the steps to reproduce the behavior

Run the commands below in order to fetch, serve, and then predict with model eos2a9n. Prediction seems to fail while prior commands succeed:

ersilia -v fetch eos2a9n | tee -a eos2a9n.log 2>&1

ersilia -v serve eos2a9n | tee -a eos2a9n_serve.log 2>&1

ersilia -v api predict -i "eml_canonical.csv" -o "eos2a9n_predict_out.csv" > eos2a9n_predict.log 2>&1

Expected behavior.

A "CSV" file containing predicted coordinates is expected. The coordinates indicate how chemical agents and targets of interest interact

Screenshots.

eos2a9n_predict.log

Operating environment

Linux Mint 19

Additional context

No response

ZakiaYahya commented 2 years ago

@Cee-tech21 I'm experiencing the same thing. i have successfully fetched all my assigned models, and i predicted all as well except one because it took so much time to predict like whole night and day, it seems like it hangs somewhere in between. i've tried it from last two days so many times but the same thing happens. So, for checking purpose i've predict the model again that i was already predicted successfully. But the same things is happening with that model too. I've close ersilia for all models but no effect. Although everything was working perfectly fine before. Screenshot 2022-10-15 120916

ZakiaYahya commented 2 years ago

@Cee-tech21 Yes exactly i have to abort the execution because it stucks in the endless loop of printing "| DEBUG | Status code: 200" & "| DEBUG | Status code: 500" over and over again and after doing this for 8-10 hours it just got stuck at this " | DEBUG | Status code: 200" giving me no error. @GemmaTuron can you please look into this issue, i'm stuck in this from yesterday morning.

GemmaTuron commented 2 years ago

Hi @Cee-tech21 !

Thanks for your patience. The model tries first to do all the molecules in batch, but if it finds an error, it will do them one by one, which is the output you are seeing, so that's normal. When a molecule is successfully predicted, you get a Status Code 200, when it is not, you get a status code 500. In any case, after it has looped through all molecules, it will pack them in a single file and give you the output. Some ML models are quite complex, and without a GPU for instance they might take long. In this case @Cee-tech21 just try and predict the first molecule of the list. If this works, we can mark the model as done and I will leave my computer doing the full calculation overnight. Same as @ZakiaYahya in #348

ZakiaYahya commented 2 years ago

okay @GemmaTuron, i'll try it for one molecule and then let you know soon. Thanks.

Cee-tech21 commented 2 years ago

First four molecules of list successfully predicted with model "eos2a9n" on google colab. Will now close this issue.

update: prediction of one molecule also done locally on laptop.