Make "no resources" error configurable for autoscaling platforms

aledegano commented 2 months ago

When I resume a session and there are not enough resources available I immediately get an error in the UI.

That makes sense in our current infra since if the resources aren't there they won't be (at least for a while), however on platforms where autoscaling is enabled (like the AWS PoC I'm carrying out), that's not necessarily true, as some resources might be coming up in a short while.

The error itself does not prevent the session to start in the background, but it might be misleading for a user.

Can we control this time interval before showing an error?

rokroskar commented 2 months ago

Is there some indication from k8s that a resource is being provisioned to satisfy the (currently unschedulable) request?

aledegano commented 2 months ago

Shortly after the pod is scheduled I see this event:

apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2024-04-19T08:05:16Z"
involvedObject:
  apiVersion: v1
  kind: Pod
  name: foo-40bar--an-2daws-2dproject-57f05e85-0
  namespace: renku
  resourceVersion: "21922912"
  uid: 467b94d8-e56a-432c-8ca1-108843aab5ec
kind: Event
lastTimestamp: "2024-04-19T08:05:16Z"
message: 'Pod should schedule on: nodeclaim/core-services-lrkls'
metadata:
  creationTimestamp: "2024-04-19T08:05:16Z"
  name: foo-40bar--an-2daws-2dproject-57f05e85-0.17c79fd3fcc0e889
  namespace: renku
  resourceVersion: "21922946"
  uid: 072773f3-e2b0-4f73-bd9f-4baca5511b66
reason: Nominated
reportingComponent: karpenter
reportingInstance: ""
source:
  component: karpenter
type: Normal

There are certainly more information from Karpenter, but that might be a bit too specific/platform-dependent...

SwissDataScienceCenter / renku-notebooks

Make "no resources" error configurable for autoscaling platforms #1843