Project-MONAI / model-zoo

MONAI Model Zoo that hosts models in the MONAI Bundle format.
Apache License 2.0
179 stars 67 forks source link

444 Fix multi-node issue #446

Closed yiheng-wang-nv closed 1 year ago

yiheng-wang-nv commented 1 year ago

Fixes #444 .

Description

This PR is used to fix the multi-gpu configs issue if running with multi-node. It also has a bit enhancement on the endoscopic_inbody_classification to add close json file step. This bundle's unit test is also added within the PR.

Status

Ready

Please ensure all the checkboxes:

yiheng-wang-nv commented 1 year ago

I'm working on using this branch to test with a multi-node machine

Nic-Ma commented 1 year ago

Hi @yiheng-wang-nv ,

Can you try?

"device": "$torch.device('cuda:' + os.environ['LOCAL_RANK'])"

I think it should work.

Thanks.

yiheng-wang-nv commented 1 year ago

/build

yiheng-wang-nv commented 1 year ago

/build

yiheng-wang-nv commented 1 year ago

Hi @Nic-Ma , I tried all bundles with changed config files, and also run multi-node training experiments on a part of bundles:

  1. spleen_deepedit_annotation
  2. spleen_ct_segmentation
  3. brats_mri_axial_slices_generative_diffusion
  4. swin_unetr_btcv_segmentation

I think this PR is now ready, and could you please help to review it, thank!

yiheng-wang-nv commented 1 year ago

/build

yiheng-wang-nv commented 1 year ago

/build

yiheng-wang-nv commented 1 year ago

/build