awslabs / autonomous-driving-data-framework

ADDF is a collection of modules, deployed using the SeedFarmer orchestration tool. ADDF modules enable users to quickly bootstrap environments for the process and analysis of autonomous driving data.
Apache License 2.0
113 stars 44 forks source link

[BUG] Review if taint in `ml-training/k8s-managed` is needed #418

Closed dgraeber closed 8 months ago

dgraeber commented 8 months ago

The ASG taint in ml-training/k8s-managed/configure_asgs.py (line 63-67 that set k8s.io/cluster-autoscaler/node-template/taint/dedicated taint) was considered needed, but preliminary testing indicates it blocks GPU instances from scaling up from 0 nodes.

Need someone to confirm that it CAN BE removed from the codebase

dgraeber commented 8 months ago

@a13zen and @kevinsoucy are the points of reference. @dgraeber has tested the removal of lines 63-67 and with 0 nodes created, the ASG did indeed create a new instance and was not blocked with k8s.io/cluster-autoscaler/node-template/taint/dedicated removed.

dgraeber commented 8 months ago

as per @a13zen this is ok to remove