Open rhugonnet opened 9 months ago
@rhugonnet - Thank you for going through this process and giving us this feedback!
In the short term
I've updated the private cluster documentation to hopefully be clearer and provide better examples and troubleshooting information. You can find the updated page at: https://slideruleearth.io/web/rtd/user_guide/Private-Clusters.html
I've also gone ahead and updated the uw
cluster to version 4. One feature of private clusters is that they can be pinned to a specific version, or to a major version, which is really helpful if you are in the middle of developing code for a specific processing run. But in this case, it was pinned to major version 3, which means it was automatically getting any updates that were a part of the version 3 releases, but didn't automatically go to version 4. @tsutterley - please let me know if it is okay that I bumped you guys to version 4. I am assuming it is what you want, but just want to check.
In the long term
I think we should add a "latest_release" version or something like that, which can be specified for a private cluster and tells it to always grab the next release even if it crosses a major version boundary. See #347 for the issue created for this.
We should add a call in the client to the provisioning system to check whether a cluster is deployed or not. This call can only be made when a request to the cluster fails, so it shouldn't affect performance, but could give vital information to the user on what they should do. See #348 for the issue created for this.
@jpswinski bumping the uw
cluster to v4 works on our end.
Hi @jpswinski, @tsutterley,
After setting up an account with "uw", and following the guidelines in https://slideruleearth.io/web/rtd/user_guide/Private-Clusters.html#getting-started-with-private-clusters, I got the following
FatalError
:Executing:
I get:
I took me a bit of time to figure out this might be from the fact that the cluster is not deployed:
Following https://slideruleearth.io/web/rtd/user_guide/Private-Clusters.html#starting-and-scaling-a-private-cluster, I also didn't know how long exactly for the cluster to start after using
sliderule.update_available_servers
(or that I would have to use that call at the very beginning of the script).Now, I still get version errors:
Maybe we could clarify these three aspects in SlideRule:
FatalError
that is not very helpful,