ayush714 / customer-satisfaction-mlops

customer-satisfaction
118 stars 72 forks source link

Error in model deployment #2

Open Luismbpr opened 5 months ago

Luismbpr commented 5 months ago

————— Python == 3.11.8

mlflow == 2.10.2 mlserver == 1.5.0 mlserver-mlflow == 1.5.0 MarkupSafe == 2.1.5 numpy == 1.26.4 pandas == 2.2.1 scikit-learn == 1.4.1.post1 tqdm == 4.66.2 zenml == 0.55.5 ————— I have been following the code of the video lecture. The previous versions of the pipeline ran well. That was until trying to deploy the model. I have made several virtual environments and used different stacks (deleted one stack and created another one and set that up (The latest stack used was: mlflow_customer_02. I still cannot make the deployment work.

This is the main error:

ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8237): Max retries exceeded with 
url: /api/v1/runs/d1673d8a-89aa-42c0-a805-53d3fa8f99ac?hydrate=True (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x28f196350>: Failed to 
establish a new connection: [Errno 61] Connection refused'))

Tried to do this as well and did not work:

% zenml down
% zenml disconnect
% zenml up
% python run_deployment.py --config deploy

————— A summary of the steps retrieved to show that the pipeline works until the deployment phase:

% python run_deployment.py --config deploy
Initiating a new run for the pipeline: continuous_deployment_pipeline.
Reusing registered pipeline version: (version: 13).
Executing a new run.
Caching is disabled by default for continuous_deployment_pipeline.
Using user: default
Using stack: mlflow_stack_customer_02
  model_deployer: mlflow_customer_02
  experiment_tracker: mlflow_tracker_customer_02
  orchestrator: default
  artifact_store: default
Step ingest_df has started.
Ingesting data from /Users/luis/Documents/.../venv_0754_FCC_MLOPS_MLProd_Projects_311_01/data/olist_customers_dataset_copy01.csv
Step ingest_df has finished in 2.512s.
Step clean_df has started.
Data cleaning completed
Step clean_df has finished in 1.542s.
Step train_model has started.
Model training completed
Model Trained Successfully
Step train_model has finished in 3.099s.
Step evaluate_model has started.
Calculating MSE
MSE: 1.864077053397548
Calculating R2 Score
R2 Score: 0.017729030402295565
Calculating RMSE
RMSE: 1.3653120717980736
Step evaluate_model has finished in 0.683s.
Step deployment_trigger has started.
Step deployment_trigger has finished in 0.095s.
Caching disabled explicitly for mlflow_model_deployer_step.
Step mlflow_model_deployer_step has started.
Calling stop method...
stop method executed successfully.
Updating an existing MLflow deployment service: MLFlowDeploymentService[577b7471-9979-487c-94fb-cc6ede12b61d] (type: model-serving, flavor: mlflow)
Calling stop method...
stop method executed successfully.
Calling start method...
⠏ Starting service 'MLFlowDeploymentService[577b7471-9979-487c-94fb-cc6ede12b61d] (type: 
model-serving, flavor: mlflow)'.

File "/Users/luis/miniforge3/envs/venv_0754_FCC_MLOPS_MLProd_Projects_311_02/lib/python3.11/site-packages/zenml/services/service.py", line 461, in start
    raise RuntimeError(
RuntimeError: Failed to start service MLFlowDeploymentService[577b7471-9979-487c-94fb-cc6ede12b61d] (type: model-serving, flavor: mlflow)
  Administrative state: active
  Operational state: inactive
  Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file: /Users/luis/Library/Application Support/zenml/local_stores/19914fc0-6d0d-41d4-bca6-4924211935c1/577b7471-9979-487c-94fb-cc6ede12b61d/service.log

Retrying (Retry(total=9, connect=5, read=None, redirect=None, status=None)) after connection broken by 'RemoteDisconnected('Remote end closed connection without response')': /api/v1/steps/feeec1ee-8f5e-41ae-87f2-d803fd045f31

(…)

Retrying (Retry(total=9, connect=5, read=None, redirect=None, status=None)) after connection broken by 'RemoteDisconnected('Remote end closed connection without response')': /api/v1/steps/feeec1ee-8f5e-41ae-87f2-d803fd045f31

ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8237): Max retries exceeded with 
url: /api/v1/runs/d1673d8a-89aa-42c0-a805-53d3fa8f99ac?hydrate=True (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x28f196350>: Failed to 
establish a new connection: [Errno 61] Connection refused'))

————— Below is more stack information —————

% zenml stack describe
COMPONENT_TYPE COMPONENT_NAME
MODEL_DEPLOYER mlflow_customer_02
EXPERIMENT_TRACKER mlflow_tracker_customer_02
ORCHESTRATOR default
ARTIFACT_STORE default

'mlflow_stack_customer_02' stack (ACTIVE)
Stack 'mlflow_stack_customer_02' with id 'c314644e-6abc-45a8-b8fa-271fff858b6c' is owned by user default. Dashboard URL: http://127.0.0.1:8237/workspaces/default/stacks/c314644e-6abc-45a8-b8fa-271fff858b 6c/configuration

—————

% zenml status

-----ZenML Server Status----- Connected to a ZenML server: 'http://127.0.0.1:8237' The active user is: 'default' The active workspace is: 'default' (repository) The active stack is: 'mlflow_stack_customer_02' (repository) Active repository root: /Users/luis/Documents/.../venv_0754_FCC_MLOPS_MLProd_Projects_311_02 Using configuration from: '/Users/luis/Library/Application Support/zenml' Local store files are located at: '/Users/luis/Library/Application Support/zenml/local_stores' The status of the local dashboard:

| ZenML server 'local' | | | URL | http://127.0.0.1:8237 | | STATUS | ✅ | | STATUS_MESSAGE | | | CONNECTED | ✅ |

—————

% zenml stack list
ACTIVE STACK NAME STACK ID OWNER MODEL_DEPLOYER EXPERIMENT_TRACKER ORCHESTRATOR ARTIFACT_STORE
👉 mlflow_stack_customer_02 c314644e-6abc-45a8-b8fa-271fff858b6c default mlflow_customer_02 mlflow_tracker_customer_02 default default
default aeff7473-997f-47a9-87fd-9d771f7543b6 - default default
mlflow_stack_customer 6a772157-30ec-463b-999f-10299ce3ec95 default mlflow_customer mlflow_tracker_customer default default

—————

% zenml logs
INFO:     127.0.0.1:50527 - "GET 
/api/v1/steps?hydrate=False&sort_by=created&logical_operator=and&page=1&size=20&scope_workspac
e=fd2a5d49-22cc-4dc8-a986-fa27bc93b88d&pipeline_run_id=d1673d8a-89aa-42c0-a805-53d3fa8f99ac 
HTTP/1.1" 200 OK

INFO:     127.0.0.1:50527 - "POST /api/v1/steps HTTP/1.1" 200 OK
objc[5368]: +[__NSCFConstantString initialize] may have been in progress in another thread 
when fork() was called.

objc[5368]: +[__NSCFConstantString initialize] may have been in progress in another thread 
when fork() was called. We cannot safely call it or ignore it in the fork() child process. 
Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

—————

Luismbpr commented 5 months ago

————— Python == 3.9.18 -> Seems to be working

mlflow == 2.10.2 mlserver == 1.5.0 mlserver-mlflow == 1.5.0 MarkupSafe == 2.1.5 numpy == 1.26.4 pandas == 2.2.1 scikit-learn == 1.4.1.post1 tqdm == 4.66.2 zenml == 0.55.5

—————

1.1) I did try to install those versions (first by bash pip install -r requirements.txt) and did not work. 1.2) Then tried installing one by one and also could not do it. Pip installer did not let me install those versions

2) I did the zenml disconnect, zenml down, zenml up many times and never got it to work.

3) Tried creating different stacks, experiment-trackers, model-deployers and set them up to be the ones working. Tried this many times

4) Something that seemed to work but not entirely sure was using those two pieces of code on the https://stackoverflow.com/questions/52671926/rails-may-have-been-in-progress-in-another-thread-when-fork-was-called

Was appending these two lines of code on the .zshrc file

% vim ~/.zshrc 
appending those two lines of code:
 ## for MLOPS deployment
export DISABLE_SPRING=true
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
% source ~/.zshrc

Then creating a new stack, experiment-tracker, model-deployer and setting them.

I am still not sure what was the piece that made it work. I have not finished the course (almost done now) but so far it seems to be working, or at least not displaying any errors.

Note: I found that stackoverflow post since the zenml logs were giving me a similar error to what one of the users from that post was having

This was a copy from that Stack Overflow post:
bjc[81924]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.
objc[81924]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.

Side Note:

typhonshambo commented 3 months ago

Hi there i was facing the same issue Make sure

Following this resolved my error :

ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8237): Max retries exceeded with 
url: /api/v1/runs/d1673d8a-89aa-42c0-a805-53d3fa8f99ac?hydrate=True (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x28f196350>: Failed to 
establish a new connection: [Errno 61] Connection refused'))

Another thing

Make sure you don't have pandas and numpy in your requirements.txt as it already comes with zenml, so reinstalling might cause some version issue

Luismbpr commented 3 months ago

Thank you for the info. As mentioned above I solved it and I think it was due to appending this on the .zshrc file:

export DISABLE_SPRING=true
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Although I am not entirely sure if those were the solutions since I did everything you mentioned previously as well.