Hi - Quick postmortem on running a flow with prefect, spacy and coiled using the container= argument .
There were two primary errors I ran into, but they were worth solving!! Why? Slimmest workflow to test, scale and deploy pipelines that I am aware of.
Pitfalls
During the container build my_conda_environment is not merged correctly with the coiled environment.
Make sure these two lines exist in the Dockerfile. Thank you @FabioRosado
ENV PATH /opt/conda/envs/my_env/bin:$PATH
RUN echo "conda activate my_env" >> ~/.bashrc
Add name of the conda environment to conda_env_name= .
IMO: The coiled docker documentation would read better moving the section below from Composing software specifications to the Docker section. Or maybe a link to the Composing Software Specifications.
[...] If you would like conda packages to be installed into a different conda environment (e.g. you’re using a custom Docker image which uses a environment not named "coiled"), then you may pass the name of the conda environment to the conda_env_name= keyword for coiled.create_software_environment.
Deserialization
The errors were fairly random and not completely reproducible. The error messages ranged from dropped scheduler connections, mismatched package versions to straight up deserialization errors. The fix was to explicitly pass the python version and make sure the 'dask distributed version is calibrated between the local machine and coiled. One suggestion for coiled might be to explicitly state what version of dask and python the latest default coiled container is using.
Working example
import prefect
from prefect import Flow, Parameter, task
import spacy
@task(log_stdout=True)
def add_last_name(first_name, last_name="Smith"):
full_name = f"{first_name} {last_name}"
print(full_name)
return full_name
@task(log_stdout=True)
def load_spacy_model(mdl_name="en_core_web_sm", lang_mdl=None):
if lang_mdl is None:
print(f"Loading language model: {mdl_name}")
mdl = spacy.load(mdl_name)
return mdl
with Flow(name="tst-deploy") as flow:
first_name = Parameter("first_name", default="Alexander")
mdl_name = Parameter("mdl_name", default="en_core_web_sm")
# A task to test that a string gets deserialized
full_name = add_last_name(first_name, last_name="Hamilton")
# 1. Load english language model
lang_mdl = load_spacy_model(mdl_name=mdl_name)
if __name__ == "__main__":
import coiled
run_se = 3
if run_se == 1:
coiled.create_software_environment(
name="tst-prefect-py38",
conda={"channels": ["conda-forge", "defaults"],
"dependencies": ["python=3.8.8", "numpy", "prefect", "spacy"]},
post_build=[
# Fixed a dask scheduler error in a running cluster
"python -m pip install jupyter-server-proxy",
"python -m spacy download en_core_web_sm"
]
)
from prefect.executors import DaskExecutor, LocalExecutor
e = 3
if e==1:
print("Local")
executor = LocalExecutor()
elif e==2:
print("Local Dask Executor")
executor = DaskExecutor()
elif e==3:
print("Coiled")
executor = DaskExecutor(
cluster_class=coiled.Cluster,
cluster_kwargs={
"n_workers":1,
"worker_cpu":4,
"worker_memory":"16 GiB",
"scheduler_memory": "16 GiB",
"software": "grybox/tst-prefect-py38",
"name": "tst-py38",
"shutdown_on_close": False
}
)
state = flow.run(
executor= executor
)
flow.visualize(flow_state=state)
Hope this helps somebody else running into issues using prefect and coiled
Package sync is the preferred feature to use for this now. Should resolve a lot of the serialization issues which were most likely caused by mismatches between dependencies
Hi - Quick postmortem on running a
flow
withprefect
,spacy
andcoiled
using thecontainer=
argument .There were two primary errors I ran into, but they were worth solving!! Why? Slimmest workflow to test, scale and deploy pipelines that I am aware of.
Pitfalls
my_conda_environment
is not merged correctly with the coiled environment.Dockerfile
. Thank you @FabioRosadoAdd name of the conda environment to
conda_env_name=
.IMO: The coiled docker documentation would read better moving the section below from Composing software specifications to the Docker section. Or maybe a link to the Composing Software Specifications.
coiled
. One suggestion forcoiled
might be to explicitly state what version of dask and python the latest default coiled container is using.Working example
Hope this helps somebody else running into issues using
prefect
andcoiled