StatCan / aaw

Documentation for the Advanced Analytics Workspace Platform
67 stars 12 forks source link

Cannot schedule SAS notebook in ohsp namespace #1935

Closed Jose-Matsuda closed 5 months ago

Jose-Matsuda commented 5 months ago

Describe the bug

Users in the ohsp-pssb namespace are unable to schedule prob SAS notebooks. We get in the events

 kubelet  (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[fdi-oha-inbox-
eprotb-protected-b fdi-oha-outbox-eprotb-protected-b], unattached volumes=[test-bemrose-volume istio-podinfo fdi-
oha-inbox-eprotb-protected-b aaw-unclassified-ro istiod-ca-cert workload-socket fdi-oha-eprotb-protected-b kube-api-
access-jwjcn workload-certs istio-envoy istio-token fdi-oha-outbox-eprotb-protected-b credential-socket protb-nb istio-
data aaw-protected-b]: timed out waiting for the condition

Environment info

Namespace: ohsp-pssb

Notebook/server: various, for each notebook that tries the fdi one it looks like though events taken from test-bemrose-0

Steps to reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '...'
  3. Scroll down to '...'
  4. See error

Expected behaviour

A clear and concise description of what you expected to happen.


If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Jose-Matsuda commented 5 months ago

Another error that I'm just now seeing is

 Warning  FailedMount  79s (x4 over 27m)  kubelet  MountVolume.MountDevice failed for volume "ohsp-pssb-fdi-protected-b-
oha-outbox-eprotb" : rpc error: code = Internal desc = Mount failed with error: rpc error: code = Unknown desc = exit status 1 
Error: failed to initialize new pipeline [failed to authenticate credentials for azstorage]
, output:
Please refer to for possible causes and solutions for mount errors.

Now investigating azure-blob-csi-system/csi-blob-node-zr8zn on the prob node logs

Jose-Matsuda commented 5 months ago

more information from FDI in response to raising the storageSPNClientSecret is not empty, use it to access storage account(...), container(...transit) the response was -- This transit container was not required by the client, therefore was never created. like 1

with more information being Just to maybe state the obvious, not all use cases Storage have a "transit" container, with its "Inbox" and "Outbox"; only the ones for which we've been asked to implement an automated ingestion and/or extraction pipeline. When no automation is needed, the "transit" container/folders shouldn't be mounted on your side. We document it in the Jira, as shown here: [CODAS-2298] FDI - Oral Health Analytics (OHA) - Common Storage in DAS - Statistics Canada Jira B (

Jose-Matsuda commented 5 months ago

Gitlab pr (with details) to hopefully resolve?

with pipeline here

Jose-Matsuda commented 5 months ago

That seems to have fixed it. A weird scenario thats explained in the gitlab pr where there was a moment where the changes werent applied and the client could work fine, but then a week ago we applied all the changes and that took in the original changes.

Just removing the transit fixed it.