Closed tomgallagher closed 2 years ago
Thank you @tomgallagher for the excellently worded issue.
cc @scharlottej13 . Maybe it makes sense to include the software environment creation step in the documentation. This will slow things down a bit, but it also empowers the user a bit to change things themselves in the future? No strong preference.
@tomgallagher Thank you for reporting this. We are investigating what happened with the original software environment and why it's not working. But in the meantime if you go back a step in the repository of the script link you provided you will see there is a file streamlit.yml
https://github.com/coiled/coiled-resources/blob/main/streamlit-with-coiled/streamlit.yml that containse the dependencies to re-create the exact same environment by doing
coiled.create_software_environment(
name="your-conda-env-name",
conda="streamlit.yml",
)
If you want to purely use pip, there is a requiremenst.txt
in the repository too https://github.com/coiled/coiled-resources/blob/main/streamlit-with-coiled/requirements.txt and you will do:
coiled.create_software_environment(
name="your-pip-env-name",
pip="requirements.txt",
)
If you want to make this environment compatible with other libraries or update to more recent versions of dask, I would suggest you copy yaml/txt file, modify it and update the dependencies you want.
Let us know if you have any questions, we are here to help.
Hey thanks for getting back so quickly
I've made some progress but now have this message in my logs
2022-06-16 17:04:48.359 Using existing cluster: 'coiled-streamlit (id: 34931)' 2022-06-16 17:04:48.774 Creating Cluster (name: coiled-streamlit, https://cloud.coiled.io/tomgallagher/clusters/34931/details ). This might take a few minutes... 2022-06-16 17:04:50.135 Scheduler: ready Workers: 10 ready (of 10) 2022-06-16 17:04:50.135 Scheduler: ready Workers: 10 ready (of 10) > 2022-06-16 17:04:51.582 error sending AWS credentials to cluster: Could not connect to the endpoint URL: "http://169.254.169.254/latest/api/token" 2022-06-16 17:04:53.831 Uncaught app exception
Do you know what I'm doing wrong here?
cc: @ntabris
Do you know what I'm doing wrong here?
Maybe nothing.
The error sending AWS credentials to cluster
isn't fatal, it means that it wasn't able to create an STS token to sent to the cluster for accessing (eg) S3 or other datasources that might use a token for AWS authentication. But could be a problem, and there are ways to deal with it, but that by itself wouldn't prevent your cluster from working (though it may cause downstream errors when your cluster tries to, e.g., read from S3).
Do you know if the cluster is otherwise up and working? Or what if anything happened when you tried to run something on it?
I do see the Uncaught app exception
in the logs you shared, but don't know where that's coming from... I don't think our coiled
client emits that but I could be wrong.
I also see that the cluster was running from about 7 minutes, from 2022-06-16T16:03:19UTC to 2022-06-16T16:10:04UTC.
@tomgallagher Are you following the example exactly as is in the code provided in the example? Because recently the nyc public data was modified, and this line won't work anymore. I wonder if that is the problem.
If this is your case, can you try to replace that line with
"s3://nyc-tlc/csv_backup/yellow_tripdata_2015-*.csv"
Perfect! I'm in business. Thanks very much. Just wanted to get the exact copy working before I moved on.
FYI still getting the error
2022-06-16 20:23:03.503 error sending AWS credentials to cluster: Could not connect to the endpoint URL: "http://169.254.169.254/latest/api/token"
@tomgallagher Glad to hear things are moving. For the error that you are still running. It would be useful to know in which line of code you are getting it, and if this could be related to streamlit. I see this line on the deploy code, that I'm not sure how it works but it could have something to do with it.
@rrpelgrim you wrote this blogpost, have you ever seen the error reported in the comment above ?
@ncclementi no, this isn't about the Coiled token, it's having a problem getting the STS token from the Amazon instance metadata service (since the coiled
client is presumably running on an EC2 instance or something that not, not running locally). I'm not sure why it's having a problem hitting Amazon instance metadata service, there are various possibilities.
One last question and then I'll leave you alone :) Promise
In the code example, you have this line
if st.button('Shutdown Cluster'):
with st.spinner("Shutting down your cluster..."):
client.shutdown()
I've been experimenting with the client.shutdown
command and, while it may stop the dask.distributed
client, the command does not seem to be passed back up to the coiled
cluster. Should I be passing an argument or should I also be calling coiled.delete_cluster(name="my-cluster")
?
@tomgallagher I'm not able to reproduce the problem, when I run the code below, my cluster closes gracefully. If you run this, does your cluster keep running?
from coiled import Cluster
from dask.distributed import Client
cluster = Cluster(name = "test_shutdown")
client = Client(cluster)
client.shutdown()
Can you give us more context or a minimal reproducible example? When you say " the command does not seem to be passed back up to the coiled cluster." what is exactly what you see?
Hi
The client reports as closed but on the dashboard the cluster is not stopped.
For the time being, I'm closing the client then also calling coiled.delete_cluster(name="my-cluster")
Which seems to work fine.
The client reports as closed but on the dashboard the cluster is not stopped.
Stopping dask should result in Coiled detecting this and cleaning up the cluster infrastructure, but it's faster and more reliable if you tell Coiled to stop the cluster, like you're doing now.
In case it's helpful, another way to do this is...
with coiled.Cluster(...) as cluster:
client = Client(cluster)
... # your code
# context manager exit handles shutting down the cluster
should result in Coiled detecting this and cleaning up the cluster infrastructure, but it's faster and more reliable if you tell Coiled to stop the cluster
The weird thing is client.shutdown
should actually be telling Coiled to stop the cluster, via cluster.close
(the client has a reference to the cluster).
Tom has a workaround and we're not able to reproduce, so it's not crucial to debug I think.
I'm just trying to get started with your coiled / streamlit example
https://github.com/coiled/coiled-resources/blob/main/streamlit-with-coiled/coiled-streamlit-deploy.py
I'm almost there but I need help with software environments.
This code
Refers to software for which the example does not provide an example :)
I'm getting this error:
How can I create a new software environment with dependencies that match the requirements of the example?
Really I just want a set of compatible instructions between this
And your example
Thanks