issues
search
aws-samples
/
amazon-eks-machine-learning-with-terraform-and-kubeflow
Distributed training using Kubeflow on Amazon EKS
Apache License 2.0
79
stars
43
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
AWS Load Balancer controller is not creating front end security group for Service
#52
ajayvohra2005
closed
9 months ago
0
helm and kubernetes providers in terraform scripts cause circular dependency
#51
ajayvohra2005
closed
9 months ago
0
Remove redundant cluster security group
#50
ajayvohra2005
closed
9 months ago
0
kubectl_manifest resources creation gives error
#49
ajayvohra2005
closed
9 months ago
0
Create EKS cluster OIDC provider
#48
ajayvohra2005
closed
9 months ago
1
Need to use define kubectl_manifest resources for external manifests
#47
ajayvohra2005
closed
9 months ago
1
Kubernetes PersistentVolume and PersistentVolumeClaim names need to be rationalized
#46
ajayvohra2005
closed
9 months ago
1
Add container resources requests
#45
ajayvohra2005
closed
9 months ago
1
Use k8s nodeSelector to select specific GPU instance types
#44
ajayvohra2005
closed
9 months ago
1
Use k8s taints and tolerations to disallow non GPU pods to run on GPU nodes
#43
ajayvohra2005
closed
9 months ago
1
Need to use AWS Load Balancer Controller
#42
ajayvohra2005
closed
9 months ago
1
FSx and EFS CSI driver versions need to be updated to latest versions
#41
ajayvohra2005
closed
9 months ago
0
Dockerfile in container-optimzed and container-optimized-viz fails to build image
#40
ajayvohra2005
closed
1 year ago
0
Upgrade default version of k8s to 1.23
#39
ajayvohra2005
closed
1 year ago
0
GPG error "public key is not available" in Ubuntu 20.04 CUDA 11.4.0 image while building images
#38
ajayvohra2005
closed
2 years ago
1
Update maskrcnn chart to use FSx for Lustre as default
#37
ajayvohra2005
closed
2 years ago
1
Update tensorpack container image to support Tensorflow 2.8.0
#36
ajayvohra2005
closed
2 years ago
1
Update tensorpack repo hash to a recent commit
#35
ajayvohra2005
closed
2 years ago
1
Combine Tensorpack training and inference into a single Docker image
#34
ajayvohra2005
closed
2 years ago
1
Change the directory for staging data on FSx for Lustre
#33
ajayvohra2005
closed
2 years ago
1
Need support for ON_DEMAND and SPOT EC2 capacity types
#32
ajayvohra2005
closed
2 years ago
1
Need to be able to attach EFS and FSx for Lustre file-systems independently in two different pods
#31
ajayvohra2005
closed
2 years ago
1
EFS file-system created by terraform needs to be encrypted
#30
ajayvohra2005
closed
2 years ago
1
Jupyter notebooks for testing need to be configured with model checkpoint directory
#29
ajayvohra2005
closed
2 years ago
1
TensorBoard needs to be secured via SSL and password protection
#28
ajayvohra2005
closed
2 years ago
1
Backoff limit may prevent the MpiJob from starting
#27
ajayvohra2005
closed
2 years ago
1
Worker restart policy policy should be Never
#26
ajayvohra2005
closed
2 years ago
1
Kubeflow MPIJob version is out of date
#25
ajayvohra2005
closed
2 years ago
1
Node group with GPU instances does not use the eks cluster autoscaler
#24
ajayvohra2005
closed
2 years ago
1
jupyter charts are using TLS v1.0
#23
ajayvohra2005
closed
2 years ago
1
EFS persistent-volume-claim is stuck in pending state
#22
ajayvohra2005
closed
2 years ago
1
File-system Ids need to be automatically updated in YAML files
#21
ajayvohra2005
closed
2 years ago
2
Solution needs to create a FSx for Lustre cluster in the Terraform
#20
ajayvohra2005
closed
2 years ago
1
New container image build should update image in Helm charts values.yaml
#19
ajayvohra2005
closed
2 years ago
1
EKS Node group should be a managed node group
#18
ajayvohra2005
closed
2 years ago
1
EKS cluster and node group should be in a private subnet
#17
ajayvohra2005
closed
2 years ago
1
EKS default version needs updated to 1.21
#16
ajayvohra2005
closed
2 years ago
1
The key pair does not exist
#15
ajinkya933
closed
3 years ago
1
EKS cluster subnets are missing required tags for load balancing
#14
jerstern
closed
3 years ago
0
Helm instructions in README need to be upgraded to version 3
#13
ajayvohra2005
closed
4 years ago
1
EKS cluster default version needs to be updated from 1.14
#12
ajayvohra2005
closed
4 years ago
1
Terraform scripts need to be updated to latest version of Terraform
#11
ajayvohra2005
closed
4 years ago
1
jupyter pod does not work
#10
oonisim
closed
3 years ago
1
image value to update in charts/maskrcnn/charts/jupyter/values.yaml
#9
oonisim
closed
3 years ago
0
https://stackoverflow.com/questions/57961162/helm-install-unknown-flag-name
#8
oonisim
closed
3 years ago
0
https://github.com/helm/helm/issues/7052
#7
oonisim
closed
3 years ago
0
AWS Case ID 7211004331
#6
oonisim
closed
3 years ago
0
fix numpy version to 1.17.5, which is the latest version that is comp…
#5
Cpruce
closed
4 years ago
0
Dockerfiles' numpy version needs to be fixed and retained when installing tensorpack
#4
Cpruce
closed
4 years ago
1
eks-workers-stack.sh subnet parsing issue
#3
gregthursam
closed
5 years ago
1
Previous
Next