issues
search
aws
/
aws-k8s-tester
AWS Kubernetes tester, kubetest2 deployer implementation
Apache License 2.0
163
stars
82
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add support for static cluster
#499
Issacwww
opened
1 day ago
0
Add Batch Optimization Scripts for NVIDIA Instances
#498
mattcjo
opened
1 week ago
0
Config MPI4 for EFA
#497
Issacwww
closed
1 week ago
0
Config MPI5 for EFA
#496
Issacwww
closed
1 week ago
0
Fix get_instance_type
#495
Issacwww
closed
1 week ago
0
Fetch instance type with fallback
#494
Issacwww
closed
1 week ago
0
fix codebuild
#493
Issacwww
closed
1 week ago
0
Fix typo
#492
Issacwww
closed
1 week ago
0
Collect node logs after --up, --down phases
#491
cartermckinnon
closed
1 week ago
0
fix: Expose all 32 EFA interfaces on p5 launch template
#490
bryantbiggs
closed
1 week ago
0
Fix Nvidia Image build
#489
Issacwww
closed
1 week ago
1
Fix unit test
#488
Issacwww
closed
2 weeks ago
0
chore: Update GPU Dockerfile versions
#487
bryantbiggs
closed
2 weeks ago
0
Add support to emit metric to the target AMP
#486
weicongw
opened
3 weeks ago
0
Bump Neuron SDK components versions
#485
nkvetsinski
closed
1 month ago
0
Add support to emit metric to the target AMP workspace
#484
weicongw
closed
4 weeks ago
0
Add support to emit metric to the target AMP workspace
#483
weicongw
closed
1 month ago
0
Opt in device plugin
#482
Issacwww
closed
4 weeks ago
0
Fix the AZs when creating subnets
#481
weicongw
closed
1 month ago
0
capacity-resevation requries efa
#480
Issacwww
opened
1 month ago
0
Support Certs via Environement Variable
#479
Issacwww
opened
1 month ago
0
Support additional certificate
#478
Issacwww
opened
1 month ago
0
Add debug logging for e2e-nvidia setup
#477
cartermckinnon
closed
1 month ago
0
Remove unsupported instance types in isolated regions
#476
ndbaker1
closed
1 month ago
0
Add default instance types for managed nodegroups
#475
cartermckinnon
closed
1 month ago
0
Verify GPU Direct RDMA is used on supported instance.
#474
weicongw
closed
1 month ago
0
Verify GPU Direct RDMA is used on supported instance.
#473
weicongw
closed
1 month ago
0
Fix volume capacity issue
#472
Issacwww
closed
1 month ago
0
Enable EFA set up for bottlerocket
#471
Issacwww
closed
1 month ago
0
Add hpc benckmark to unit test, and add "capacity-reservation" flag to deployer
#470
weicongw
closed
1 month ago
0
Add --node-creation-timeout flag
#469
cartermckinnon
closed
2 months ago
0
Bump go version in kubetest2 image
#468
ndbaker1
closed
2 months ago
1
Add BERT e2e training test
#467
mattcjo
opened
2 months ago
0
Add bert e2e test for neuron device
#466
weicongw
closed
2 months ago
0
Fix GetJobLogs and e2e-neuron binary not exits issue.
#465
weicongw
closed
2 months ago
0
Pull the logs when test finished and remove unnecessary resources requests and limits in the nccl test manifest
#464
weicongw
closed
2 months ago
0
replace `wait.WithTimeout(timeout)` with `wait.WithContext(ctx))`
#463
weicongw
closed
2 months ago
0
Increase cluster creation time out
#462
Issacwww
closed
3 months ago
0
Add bert e2e test for neuron device
#461
weicongw
closed
2 months ago
1
Add inference test e2e go binary to Dockerfile.kubetest2
#460
mattcjo
closed
3 months ago
0
Add BERT Inference Test
#459
mattcjo
closed
3 months ago
0
Use instance type from EC2 API instead of Node label
#458
cartermckinnon
closed
3 months ago
0
Add test case for unit test and delete the duplicated docker file.
#457
weicongw
closed
3 months ago
0
Add GPU unit test
#456
weicongw
closed
3 months ago
0
Add docker image for BERT e2e inference task
#455
mattcjo
closed
3 months ago
3
Add docker image for BERT e2e training task
#454
mattcjo
closed
2 months ago
1
Add --user-data-file option
#453
cartermckinnon
opened
4 months ago
0
Add single node Neuron test to the e2e tester
#452
weicongw
closed
3 months ago
0
Add node metrics for time to register, ready
#451
cartermckinnon
closed
4 months ago
0
Add single node Neuron test to the e2e tester
#450
weicongw
closed
4 months ago
1
Next