Open jithenece opened 4 months ago
@LennyN95 Transformation matrix are used by this model. Could you add Transform(.mat)
to the list of allowed files in mhubio/core/FileType.py
/test
sample:
idc_version: "Data Release 2.0 October 24, 2022"
data:
- SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.72111832425535404540752357374191117693
aws_url: s3://idc-open-data/81967c43-b804-410a-91f0-619cea9e74d6/*
path: input_data/case_study1/flair
- SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.179890292978753819584197741746287159209
aws_url: s3://idc-open-data/92492275-9496-4049-8273-4c6461d75fc9/*
path: input_data/case_study1/t2
- SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.337491152254065288399657726162931889194
aws_url: s3://idc-open-data/951a4b1e-1ed3-4c59-b7d0-0e877b370b03/*
path: input_data/case_study1/t1
- SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.339743325012852358051228621010996392201
aws_url: s3://idc-open-data/7facf869-55b9-441f-81ef-a3ec2e63c5a8/*
path: input_data/case_study1/t1
reference:
url: https://drive.google.com/file/d/1Xq8H_qmPx-hS15ChciCXi12-EAdyKIhr/view?usp=sharing
This looks like the most-complex workflow added to MHub yet, I would love to hear about your experience in breaking up your work into MHub-IO Modules ;)
Some comments below, some apply to all Modules to simplify them a bit by making them more specific. Great work!!
Thank you for your feedback! It’s great to hear your thoughts on the complexity and structure of this model. Breaking it down into MHub-IO Modules was indeed a thoughtful process. Initially, we had a single module covering all steps, but we found it was difficult to understand.
I will review your suggestions and make the necessary changes.
idc_version: "Data Release 2.0 October 24, 2022"
@jithenece there was no data release 2.0 on that date in IDC. IDC v2 was in June 2021, see https://learn.canceridc.dev/data/data-release-notes#v2-june-2021. Where did you get this date from?
1.3.6.1.4.1.14519.5.2.1.72111832425535404540752357374191117693 aws_url: s3://idc-open-data/81967c43-b804-410a-91f0-619cea9e74d6/*
Apologies. I have used collection version number for UPENN-GBM instead of idc version. I will update it
/test
uploading the segmentation file output.zip
sample:
idc_version: 10
data:
- SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.339743325012852358051228621010996392201
aws_url: s3://idc-open-data/7facf869-55b9-441f-81ef-a3ec2e63c5a8/*
path: case1/t1
- SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.179890292978753819584197741746287159209
aws_url: s3://idc-open-data/92492275-9496-4049-8273-4c6461d75fc9/*
path: case1/t2
- SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.337491152254065288399657726162931889194
aws_url: s3://idc-open-data/951a4b1e-1ed3-4c59-b7d0-0e877b370b03/*
path: case1/t1ce
- SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.72111832425535404540752357374191117693
aws_url: s3://idc-open-data/81967c43-b804-410a-91f0-619cea9e74d6/*
path: case1/flair
reference:
url: https://github.com/user-attachments/files/16783893/output.zip
Please note, we updated our base image. All mhub dependencies are now installed in a virtual environment under /app/.venv
running Python 3.11
. Python, virtual environment and dependencies are now managed with uv. If required, you can create custom virtual environments, e.g., uv venv -p 3.8 .venv38
and use uv pip install -p .venv38 packge-name
to install dependencies and uv run -p .venv3.8 python script.py
to run a python script.
We also simplified our test routine. Sample and reference data now have to be uploaded to Zenodo and provided in a mhub.tom file at the project root. The process how to create and provide these sample data is explained in the updated testing phase article of our documentation. Under doi.org/10.5281/zenodo.13785615 we provide sample data as a reference.
@LennyN95 I have updated. Please let me know if any changes required.
Thanks @LennyN95 for testing this out. Is it possible to share the docker logs as it seems to ran fine for me by following the steps mentioned in the documentation
I had one similar issue while testing locally where docker ran fine but output files were not in the output mounted path. I found there was access issue to write file and had to run chmod 777 -R
on the output mounted folder post which it got resolved. Could you try this or share the docker logs please.
It appears that this is caused by the same error as reporter in https://github.com/MHubAI/models/pull/92#issuecomment-2401809168. I attached a complete run log (run on and after downloading the provided sample data). Can it be that you're running a different version of your NNUnetRunnerV2 module maybe? mhub.run.log
We used --ipc=host
to the docker command to fix this error. Could you try this please?
Why would you require this? The MHub model is meant to run entirely shielded from any host processes. Our testing procedure does not allow any breach from the default setup (similar with our offline requirement). Can you elaborate why exactly you specify this flag?
error logs show that
raise RuntimeError('Background workers died. Look for the error message further up! If there is ' RuntimeError: Background workers died. Look for the error message further up! If there is none then your RAM was full and the worker was killed by the OS. Use fewer workers or get more RAM in that case!
The resource needs to be increased for running NNUnetRunnerV2
. Hence adding this --ipc=host
flag, it provides more resources and completes the execution.
This seems to be a problem with NNUnet (see https://github.com/MIC-DKFZ/nnUNet/issues/2452). Can you please open an issue in the NNUnet repository?
I understand that setting --ipc=host
seems to be a simple solution. However, for MHub we need to a) make sure that all models run with the same instruction set to keep it simple, b) make sure that all MHub containers are shielded from host processes and online services to address security concerns. For future submissions, it's best if you let us know in advance so we can find a solution in the early stages.
The issue was already raised. They referred this
Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.
Thx @jithenece for sharing. I wonder if multithread data loaders are required since we implement sequential execution (in contrast to batch execution) in MHub?
I tried to avoid multithread data loaders by reducing nnUNet_def_n_proc
and nnUNet_n_proc_DA
settings available at nnunetv2/configuration.py. But it does not seem to overcome this.
Alternatively,
@LennyN95 could we test them and merge?
@jithenece the additional cli commands are not compatible with our test routine, so the submission is on hold until we could fix the problem. We are working on a solution (see https://github.com/MIC-DKFZ/nnUNet/issues/2556) and are writing a PR to enable single-process inference with nnunet v2. Once this has been resolved we're continuing with the submission process and testing of the model as planned.
Pretrained model for 3D semantic image segmentation of the brain tumor, necrosis, and edema from MRI scans