Closed yiheng-wang-nv closed 1 year ago
I'm working on using this branch to test with a multi-node machine
Hi @yiheng-wang-nv ,
Can you try?
"device": "$torch.device('cuda:' + os.environ['LOCAL_RANK'])"
I think it should work.
Thanks.
/build
/build
Hi @Nic-Ma , I tried all bundles with changed config files, and also run multi-node training experiments on a part of bundles:
spleen_deepedit_annotation
spleen_ct_segmentation
brats_mri_axial_slices_generative_diffusion
swin_unetr_btcv_segmentation
I think this PR is now ready, and could you please help to review it, thank!
/build
/build
/build
Fixes #444 .
Description
This PR is used to fix the multi-gpu configs issue if running with multi-node. It also has a bit enhancement on the
endoscopic_inbody_classification
to add close json file step. This bundle's unit test is also added within the PR.Status
Ready
Please ensure all the checkboxes:
./runtests.sh --codeformat
.version
andchangelog
inmetadata.json
if changing an existing bundle.CONTRIBUTING.md
).monai
,pytorch
andnumpy
are correct inmetadata.json
.eval_metrics
of the provided weights and TorchScript modules.large_file.yml
./home/your_name/
for"bundle_root"
).