Error in model deployment

Luismbpr commented 5 months ago

————— Python == 3.11.8

mlflow == 2.10.2 mlserver == 1.5.0 mlserver-mlflow == 1.5.0 MarkupSafe == 2.1.5 numpy == 1.26.4 pandas == 2.2.1 scikit-learn == 1.4.1.post1 tqdm == 4.66.2 zenml == 0.55.5 ————— I have been following the code of the video lecture. The previous versions of the pipeline ran well. That was until trying to deploy the model. I have made several virtual environments and used different stacks (deleted one stack and created another one and set that up (The latest stack used was: mlflow_customer_02. I still cannot make the deployment work.

This is the main error:

ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8237): Max retries exceeded with 
url: /api/v1/runs/d1673d8a-89aa-42c0-a805-53d3fa8f99ac?hydrate=True (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x28f196350>: Failed to 
establish a new connection: [Errno 61] Connection refused'))

Tried to do this as well and did not work:

% zenml down
% zenml disconnect
% zenml up
% python run_deployment.py --config deploy

————— A summary of the steps retrieved to show that the pipeline works until the deployment phase:

% python run_deployment.py --config deploy

Initiating a new run for the pipeline: continuous_deployment_pipeline.
Reusing registered pipeline version: (version: 13).
Executing a new run.
Caching is disabled by default for continuous_deployment_pipeline.
Using user: default
Using stack: mlflow_stack_customer_02
  model_deployer: mlflow_customer_02
  experiment_tracker: mlflow_tracker_customer_02
  orchestrator: default
  artifact_store: default
Step ingest_df has started.
Ingesting data from /Users/luis/Documents/.../venv_0754_FCC_MLOPS_MLProd_Projects_311_01/data/olist_customers_dataset_copy01.csv
Step ingest_df has finished in 2.512s.
Step clean_df has started.
Data cleaning completed
Step clean_df has finished in 1.542s.
Step train_model has started.
Model training completed
Model Trained Successfully
Step train_model has finished in 3.099s.
Step evaluate_model has started.
Calculating MSE
MSE: 1.864077053397548
Calculating R2 Score
R2 Score: 0.017729030402295565
Calculating RMSE
RMSE: 1.3653120717980736
Step evaluate_model has finished in 0.683s.
Step deployment_trigger has started.
Step deployment_trigger has finished in 0.095s.
Caching disabled explicitly for mlflow_model_deployer_step.
Step mlflow_model_deployer_step has started.
Calling stop method...
stop method executed successfully.
Updating an existing MLflow deployment service: MLFlowDeploymentService[577b7471-9979-487c-94fb-cc6ede12b61d] (type: model-serving, flavor: mlflow)
Calling stop method...
stop method executed successfully.
Calling start method...
⠏ Starting service 'MLFlowDeploymentService[577b7471-9979-487c-94fb-cc6ede12b61d] (type: 
model-serving, flavor: mlflow)'.

File "/Users/luis/miniforge3/envs/venv_0754_FCC_MLOPS_MLProd_Projects_311_02/lib/python3.11/site-packages/zenml/services/service.py", line 461, in start
    raise RuntimeError(
RuntimeError: Failed to start service MLFlowDeploymentService[577b7471-9979-487c-94fb-cc6ede12b61d] (type: model-serving, flavor: mlflow)
  Administrative state: active
  Operational state: inactive
  Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file: /Users/luis/Library/Application Support/zenml/local_stores/19914fc0-6d0d-41d4-bca6-4924211935c1/577b7471-9979-487c-94fb-cc6ede12b61d/service.log

Retrying (Retry(total=9, connect=5, read=None, redirect=None, status=None)) after connection broken by 'RemoteDisconnected('Remote end closed connection without response')': /api/v1/steps/feeec1ee-8f5e-41ae-87f2-d803fd045f31

(…)

Retrying (Retry(total=9, connect=5, read=None, redirect=None, status=None)) after connection broken by 'RemoteDisconnected('Remote end closed connection without response')': /api/v1/steps/feeec1ee-8f5e-41ae-87f2-d803fd045f31

ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8237): Max retries exceeded with 
url: /api/v1/runs/d1673d8a-89aa-42c0-a805-53d3fa8f99ac?hydrate=True (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x28f196350>: Failed to 
establish a new connection: [Errno 61] Connection refused'))

————— Below is more stack information —————

% zenml stack describe

COMPONENT_TYPE	COMPONENT_NAME
MODEL_DEPLOYER	mlflow_customer_02
EXPERIMENT_TRACKER	mlflow_tracker_customer_02
ORCHESTRATOR	default
ARTIFACT_STORE	default

'mlflow_stack_customer_02' stack (ACTIVE)
Stack 'mlflow_stack_customer_02' with id 'c314644e-6abc-45a8-b8fa-271fff858b6c' is owned by user default. Dashboard URL: http://127.0.0.1:8237/workspaces/default/stacks/c314644e-6abc-45a8-b8fa-271fff858b 6c/configuration

—————

% zenml status

-----ZenML Server Status----- Connected to a ZenML server: 'http://127.0.0.1:8237' The active user is: 'default' The active workspace is: 'default' (repository) The active stack is: 'mlflow_stack_customer_02' (repository) Active repository root: /Users/luis/Documents/.../venv_0754_FCC_MLOPS_MLProd_Projects_311_02 Using configuration from: '/Users/luis/Library/Application Support/zenml' Local store files are located at: '/Users/luis/Library/Application Support/zenml/local_stores' The status of the local dashboard:

—————

% zenml stack list

ACTIVE	STACK NAME	STACK ID	OWNER	MODEL_DEPLOYER	EXPERIMENT_TRACKER	ORCHESTRATOR	ARTIFACT_STORE
👉	mlflow_stack_customer_02	c314644e-6abc-45a8-b8fa-271fff858b6c	default	mlflow_customer_02	mlflow_tracker_customer_02	default	default
	default	aeff7473-997f-47a9-87fd-9d771f7543b6	-			default	default
	mlflow_stack_customer	6a772157-30ec-463b-999f-10299ce3ec95	default	mlflow_customer	mlflow_tracker_customer	default	default

—————

% zenml logs

INFO:     127.0.0.1:50527 - "GET 
/api/v1/steps?hydrate=False&sort_by=created&logical_operator=and&page=1&size=20&scope_workspac
e=fd2a5d49-22cc-4dc8-a986-fa27bc93b88d&pipeline_run_id=d1673d8a-89aa-42c0-a805-53d3fa8f99ac 
HTTP/1.1" 200 OK

INFO:     127.0.0.1:50527 - "POST /api/v1/steps HTTP/1.1" 200 OK
objc[5368]: +[__NSCFConstantString initialize] may have been in progress in another thread 
when fork() was called.

objc[5368]: +[__NSCFConstantString initialize] may have been in progress in another thread 
when fork() was called. We cannot safely call it or ignore it in the fork() child process. 
Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

—————

Luismbpr commented 5 months ago

————— Python == 3.9.18 -> Seems to be working

mlflow == 2.10.2 mlserver == 1.5.0 mlserver-mlflow == 1.5.0 MarkupSafe == 2.1.5 numpy == 1.26.4 pandas == 2.2.1 scikit-learn == 1.4.1.post1 tqdm == 4.66.2 zenml == 0.55.5

—————

1.1) I did try to install those versions (first by bash pip install -r requirements.txt) and did not work. 1.2) Then tried installing one by one and also could not do it. Pip installer did not let me install those versions

2) I did the zenml disconnect, zenml down, zenml up many times and never got it to work.

3) Tried creating different stacks, experiment-trackers, model-deployers and set them up to be the ones working. Tried this many times

4) Something that seemed to work but not entirely sure was using those two pieces of code on the https://stackoverflow.com/questions/52671926/rails-may-have-been-in-progress-in-another-thread-when-fork-was-called

Was appending these two lines of code on the .zshrc file

% vim ~/.zshrc 
appending those two lines of code:

 ## for MLOPS deployment
export DISABLE_SPRING=true
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

% source ~/.zshrc

Then creating a new stack, experiment-tracker, model-deployer and setting them.

I am still not sure what was the piece that made it work. I have not finished the course (almost done now) but so far it seems to be working, or at least not displaying any errors.

Note: I found that stackoverflow post since the zenml logs were giving me a similar error to what one of the users from that post was having

This was a copy from that Stack Overflow post:
bjc[81924]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.
objc[81924]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.

Side Note:

Posted on: https://github.com/ayush714/customer-satisfaction-mlops/issues/2 https://github.com/ayush714/mlops-projects-course/issues/2 https://github.com/zenml-io/zenml/issues/2369

typhonshambo commented 3 months ago

Hi there i was facing the same issue Make sure

you have setup the virtual-env correctly
now run this commands in virtual-env terminal
```
zenml disconnect
zenml down
zenml clear
```
After the zenml server is finally cleared and down
Close the terminal and open two different terminals one to run your python file and another to run your zenml server, make sure that you activate your virtual-env in both.
I would highly recommend using external terminals like command prompt (windows) or terminal (mac) to run the zenml server, and for running the python file you can use the normal vs-code terminal or any other IDE's that you using
After you have setup all the terminals properly
run the run_pipeline.py file from vscode, after its completely done with all the operations
Go to that another external terminal and run your zenml server by zenml up
Don't run zenml up before the python file

Following this resolved my error :

ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8237): Max retries exceeded with 
url: /api/v1/runs/d1673d8a-89aa-42c0-a805-53d3fa8f99ac?hydrate=True (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x28f196350>: Failed to 
establish a new connection: [Errno 61] Connection refused'))

Another thing

Make sure you don't have pandas and numpy in your requirements.txt as it already comes with zenml, so reinstalling might cause some version issue

Luismbpr commented 3 months ago

Thank you for the info. As mentioned above I solved it and I think it was due to appending this on the .zshrc file:

export DISABLE_SPRING=true
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Although I am not entirely sure if those were the solutions since I did everything you mentioned previously as well.

ayush714 / customer-satisfaction-mlops

Error in model deployment #2