issues
search
aws-samples
/
aws-do-eks
MIT No Attribution
42
stars
27
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Updating workshop with GPU based node group, increased inf2 node capacity, added script to install IAM role for CloudWatch add-on
#23
dzilbermanvmw
closed
3 weeks ago
0
Update Distributed training with Amazon EKS and Torch Distributed Elastic blog
#22
HUNG-rushb
opened
7 months ago
0
Update install-eksctl.sh
#21
karlamazon
closed
5 months ago
1
Set default FSDP example to NanoGPT 10.5M
#20
iankouls-aws
closed
8 months ago
0
MPIJob EFA example doesn't apply
#19
kwohlfahrt
opened
10 months ago
0
Add fsdp example
#18
iankouls-aws
closed
8 months ago
0
Cluster Auto-scaler forbidden error
#17
chailatent
opened
1 year ago
0
Update eks-kubeflow.yaml
#16
ikopas
closed
8 months ago
1
Deepsparse
#15
InquestGeronimo
closed
1 year ago
0
AWS EKS integration
#14
InquestGeronimo
closed
1 year ago
0
Use Finch instead of docker on OSX
#13
perifaws
opened
1 year ago
2
SC22 example additions
#12
iankouls-aws
closed
1 year ago
0
Deepspeed
#11
iankouls-aws
closed
1 year ago
0
adding efa docker and also refactoring code
#10
iyounus-aws
closed
2 years ago
0
fixing naming convention for pv and pvc
#9
iyounus-aws
closed
2 years ago
0
adding script and instructions for creating FSx pv statically
#8
iyounus-aws
closed
2 years ago
0
adding REGISTRY check in build/push scripts.
#7
iyounus-aws
closed
2 years ago
0
adding scritps for setting up cloudwatch agent to collect gpu metrics
#6
iyounus-aws
closed
2 years ago
0
refactoring efs and fsx set up code
#5
iyounus-aws
closed
2 years ago
0
adding readme for setting up fsx
#4
iyounus-aws
closed
2 years ago
0
updating the mpi-operator github repo tag
#3
iyounus-aws
closed
2 years ago
0
adding scripts/yamls for creating efs filesystem and training model w…
#2
iyounus-aws
closed
2 years ago
0
v2 release
#1
iankouls-aws
closed
2 years ago
0