Open serge-dolgavin-dxc opened 2 months ago
@serge-dolgavin-dxc
You can try this before merge, I need to test from the groundup so this may take awhile before its merged.
File: ide-modules.yaml
name: jupyter-hub
path: git::https://github.com/awslabs/autonomous-driving-data-framework.git//modules/demo-only/jupyter-hub?ref=chore/583&depth=1
@malachi-constant ,
unfortunately,
name: jupyter-hub
path: git::https://github.com/awslabs/autonomous-driving-data-framework.git//modules/demo-only/jupyter-hub?ref=chore/583&depth=1
doesn't work for me:
$ seedfarmer apply ./manifests/demo/deployment.yaml --dry-run
...
[2024-09-05 07:10:17,386 | INFO | _deployment_commands.py:636 | MainThread ] Verifying all modules in ide for deploy
Traceback (most recent call last):
...
cmdline: git pull -v -- origin chore/583
stderr: 'fatal: couldn't find remote ref chore/583'
During handling of the above exception, another exception occurred:
...
.../autonomous-driving-data-framework/.venv/lib/python3.8/site-packages/seedfarmer/mgmt/git_support.py", line 79, in clone_module_repo
raise InvalidConfigurationError(f"\n Cannot Clone Repo: {ge} {messages.git_error_support()}")
seedfarmer.errors.seedfarmer_errors.InvalidConfigurationError:
Cannot Clone Repo: Cmd('git') failed due to: exit code(1)
cmdline: git pull -v -- origin chore/583
stderr: 'fatal: couldn't find remote ref chore/583'
1. Make sure your path to the repo is correct and valid (check your module manifests!)
2. The credentials used to call SeedFarmer have access to the repo
3. The credentials used to call SeedFarmer have not expired
@malachi-constant ,
with
name: jupyter-hub
path: modules/demo-only/jupyter-hub/
I got the following error:
1321 | addf-demo-ide-jupyter-hub | 4/11 | 7:17:33 AM | CREATE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | addf-demo-ide-jupyter-hub-eks-cluster/manifest-jupyter-hub-namespace/Resource/Default (addfdemoidejupyterhubeksclustermanifestjupyterhubnamespaceXXXXXXXX) Received response status [FAILED] from custom resource. Message returned: Error: b'\nAn error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts::XXXXXXXXX:assumed-role/addf-demo-ide-jupyter-hub-HandlerServiceRoleXXXXXXXXXXXXXXX/addf-demo-ide-jupyter-hub-addfdemo-HandlerXXXXXXXXXXXX is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::XXXXXXXXXXXXX:role/addf-demo-core-eks-clusterCreationRoleXXXXXXXXXXX\nUnable to connect to the server: getting credentials: exec: executable aws failed with exit code 255\n' ...
@malachi-constant ,
please find the attached the codebuild log for jupyter-hub module: jupyter-hub_CodeBuild.log
@serge-dolgavin-dxc Can you try this module from main
that branch was deleted after merge
@malachi-constant ,
Sorry that my messages are not clear and for confusion.
I have recognized that the branch was deleted and I am already using main
for the last 5 days.
The yesterday's codebuild log for jupyter-hub module is based of the recent main
branch.
Gotcha missed that, taking a look...
@serge-dolgavin-dxc Are you able to provide the trust policy for arn:aws:iam::XXXXXXXXXXXXX:role/addf-demo-core-eks-clusterCreationRoleXXXXXXXXXXX\
with account values sanitized as well so I compare to what I have tested? I am not able to replicate.
@malachi-constant , please find the attached policy details along with the latest codebuild log. jupyter-hub.zip
Ok so the trust is not being added for some reason, can you also tell me which version of the eks module is deployed?
I am using the latest main branch (default demo / example-dev manifests).
name: eks
path: git::https://github.com/awslabs/idf-modules.git//modules/compute/eks?ref=release/1.11.0
dataFiles:
- filePath: git::https://github.com/awslabs/idf-modules.git//data/eks_dockerimage-replication/versions/1.29.yaml?ref=release/1.11.0
- filePath: git::https://github.com/awslabs/idf-modules.git//data/eks_dockerimage-replication/versions/default.yaml?ref=release/1.11.0
Ok thanks, was able to replicate, working on it...
@serge-dolgavin-dxc
See manifest in PR
This error is resolved by updating ide-modules.yaml
name: jupyter-hub
path: modules/demo-only/jupyter-hub/
parameters:
- name: eks-cluster-admin-role-arn
valueFrom:
moduleMetadata:
group: core
name: eks
key: EksClusterMasterRoleArn
@malachi-constant , thanks a lot for your help! I was able to deploy jupyter-hub
module.
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ Account ┃ Region ┃ Deployment ┃ Group ┃ Module ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ primary │ eu-west-1 │ demo │ optionals │ networking │
│ primary │ eu-west-1 │ demo │ optionals │ datalake-buckets │
│ primary │ eu-west-1 │ demo │ replication │ replication │
│ primary │ eu-west-1 │ demo │ core │ metadata-storage │
│ primary │ eu-west-1 │ demo │ core │ eks │
│ primary │ eu-west-1 │ demo │ core │ batch-compute │
│ primary │ eu-west-1 │ demo │ core │ efs │
│ primary │ eu-west-1 │ demo │ ide │ jupyter-hub │
└─────────┴───────────┴────────────┴─────────────┴──────────────────┘
Unfortunately, I got two issues after the deployment:
I was not able to query the DNS Name of the JupyterHub
$ kubectl get ing jupyterhub -n jupyter-hub -o jsonpath="{.status.loadBalancer.ingress[0].hostname}"
E0913 07:48:55.773416 20574 memcache.go:265] couldn't get current server API group list: Get "https://7AB5A3CFD6880B49EFACA781A5D20570.gr7.eu-central-1.eks.amazonaws.com/api?timeout=32s": dial tcp: lookup 7AB5A3CFD6880B49EFACA781A5D20570.gr7.eu-central-1.eks.amazonaws.com on 172.20.48.1:53: no such host
E0913 07:48:55.778115 20574 memcache.go:265] couldn't get current server API group list: Get "https://7AB5A3CFD6880B49EFACA781A5D20570.gr7.eu-central-1.eks.amazonaws.com/api?timeout=32s": dial tcp: lookup 7AB5A3CFD6880B49EFACA781A5D20570.gr7.eu-central-1.eks.amazonaws.com on 172.20.48.1:53: no such host
E0913 07:48:55.781898 20574 memcache.go:265] couldn't get current server API group list: Get "https://7AB5A3CFD6880B49EFACA781A5D20570.gr7.eu-central-1.eks.amazonaws.com/api?timeout=32s": dial tcp: lookup 7AB5A3CFD6880B49EFACA781A5D20570.gr7.eu-central-1.eks.amazonaws.com on 172.20.48.1:53: no such host
E0913 07:48:55.785906 20574 memcache.go:265] couldn't get current server API group list: Get "https://7AB5A3CFD6880B49EFACA781A5D20570.gr7.eu-central-1.eks.amazonaws.com/api?timeout=32s": dial tcp: lookup 7AB5A3CFD6880B49EFACA781A5D20570.gr7.eu-central-1.eks.amazonaws.com on 172.20.48.1:53: no such host
E0913 07:48:55.794678 20574 memcache.go:265] couldn't get current server API group list: Get "https://7AB5A3CFD6880B49EFACA781A5D20570.gr7.eu-central-1.eks.amazonaws.com/api?timeout=32s": dial tcp: lookup 7AB5A3CFD6880B49EFACA781A5D20570.gr7.eu-central-1.eks.amazonaws.com on 172.20.48.1:53: no such host
Unable to connect to the server: dial tcp: lookup 7AB5A3CFD6880B49EFACA781A5D20570.gr7.eu-central-1.eks.amazonaws.com on 172.20.48.1:53: no such host
Please notice regions. ADDF demo was deployed in eu-west-1, not eu-central-1.
Spawn failed after authentication on jupyter-hub:
Event log
Server requested
2024-09-13T05:51:01.159711Z [Normal] Successfully assigned jupyter-hub/jupyter-testadmin to ip-10-0-5-247.eu-west-1.compute.internal
2024-09-13T05:51:05Z [Normal] AttachVolume.Attach succeeded for volume "pvc-476aa8dd-ff44-4961-bd31-e335e243b2c2"
2024-09-13T05:51:06Z [Warning] Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c32ec298a72142146904b12cd76eed4d0de1cb67d0bcffe61ace594ef57748f4": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
2024-09-13T05:51:20Z [Warning] Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "947b92fa3b8b39f8d5739e39c4c3fb9dd4ec4c086e9ab1c245c071f4d830ba01": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
2024-09-13T05:51:33Z [Warning] Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "812dcbfd7cc2d97c2557ff5e647fd0459b3578b5bf57266ea52a32f61e24b4be": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
2024-09-13T05:51:46Z [Warning] Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "814fcaf4f3b16a97d283b3ebec306bf2c04c8cf18f223648c954300a9ddfa72e": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
2024-09-13T05:52:00Z [Warning] Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "07333715950cac4d843c6d86f2c3cddff3ee6e9089303c427e7036dd0c255a83": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
2024-09-13T05:52:12Z [Warning] Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4adad6b61854075ae7cc294aa9879bdebb999dc3f12e32c882e5386fa4a711f6": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
2024-09-13T05:52:25Z [Warning] Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9f420ca72cbf547e4e5fca53640b569ed68ede236b597f1c1b9d7ba00e666aea": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
2024-09-13T05:52:39Z [Warning] Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b8d0932f530f7347b9119f49782f83cbf9af1434a5330ad6ce3fa146187b5f31": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
2024-09-13T05:52:51Z [Warning] Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "bc3e7ba7bd1eae2c9279fc247d72dcd87355413ce1919b58f3e451c159ff39cd": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
2024-09-13T05:53:04Z [Warning] (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "66b1d8132d73dcbba43f04acc1fe2926c56e641a060d9bcb5f12833f20f7c284": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
Spawn failed: pod jupyter-hub/jupyter-testadmin did not start in 300 seconds!
Could you please advise how to address these issues?
@serge-dolgavin-dxc
I think your credentials for kubectl are pointing to the wrong cluster (do you have multiple clusters defined in .kube?)...this command:
kubectl get ing jupyterhub -n jupyter-hub -o jsonpath="{.status.loadBalancer.ingress[0].hostname}"
Should be executed against the proper cluster...
REF: https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/
@dgraeber , thanks a lot for your hint!
addf-demo-core-eks-cluster configuration was missing.
The first issue was solved, but the second still remain. Is it an issue with access rights?
Describe the bug
addf-demo-ide-jupyter-hub deployment failure, due to no longer supported runtime.
To Reproduce deploy jupyter-hub module
Expected behavior jupyter-hub deployed without issues
Screenshots na
Additional context ... Failed resources: addf-demo-ide-jupyter-hub | 10:17:07 AM | CREATE_FAILED | AWS::Lambda::Function | AWSCDKCfnUtilsProviderCustomResourceProvider/Handler handler returned message: "The runtime parameter of nodejs12.x is no longer supported for creating or updating AWS Lambda functions. We recommend you use a supported runtime while creating or updating functions. (Service: Lambda, Status Code: 400,
❌ addf-demo-ide-jupyter-hub failed: Error: The stack named addf-demo-ide-jupyter-hub failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE ...