Project-HAMi / HAMi

Heterogeneous AI Computing Virtualization Middleware
http://project-hami.io/
Apache License 2.0
957 stars 197 forks source link

fix: exception happen when creating multiple ascend-gpu pods concurrently #575

Open lijm87 opened 3 weeks ago

lijm87 commented 3 weeks ago

What type of PR is this? /kind bug

What this PR does / why we need it: fix: exception occurred while creating multiple pods with ascend gpu concurrently

Which issue(s) this PR fixes: Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

codecov[bot] commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 55.55556% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/device/ascend/device.go 55.55% 6 Missing and 2 partials :warning:
Flag Coverage Δ
unittests 27.43% <55.55%> (+0.41%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pkg/device/ascend/device.go 11.22% <55.55%> (+8.30%) :arrow_up:
archlitchi commented 3 weeks ago

yes, nodelock is necessary for ascend jobs, but you need to implement 'release lock' part in 'ascend-device-plugin' for it to work

lijm87 commented 3 weeks ago

yes, nodelock is necessary for ascend jobs, but you need to implement 'release lock' part in 'ascend-device-plugin' for it to work

done. together with PR in 'ascend-device-plugin': https://github.com/Project-HAMi/ascend-device-plugin/pull/7