MHubAI / models

Stores the MHub models dockerfiles and scripts.
MIT License
8 stars 16 forks source link

BAMF MR Brain tumor segmentation #93

Open jithenece opened 3 months ago

jithenece commented 3 months ago

Pretrained model for 3D semantic image segmentation of the brain tumor, necrosis, and edema from MRI scans

jithenece commented 2 months ago

@LennyN95 Transformation matrix are used by this model. Could you add Transform(.mat) to the list of allowed files in mhubio/core/FileType.py

jithenece commented 2 months ago

/test

sample:
  idc_version: "Data Release 2.0 October 24, 2022"
  data:
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.72111832425535404540752357374191117693
    aws_url: s3://idc-open-data/81967c43-b804-410a-91f0-619cea9e74d6/*
    path: input_data/case_study1/flair
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.179890292978753819584197741746287159209
    aws_url: s3://idc-open-data/92492275-9496-4049-8273-4c6461d75fc9/*
    path: input_data/case_study1/t2
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.337491152254065288399657726162931889194
    aws_url: s3://idc-open-data/951a4b1e-1ed3-4c59-b7d0-0e877b370b03/*
    path: input_data/case_study1/t1
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.339743325012852358051228621010996392201
    aws_url: s3://idc-open-data/7facf869-55b9-441f-81ef-a3ec2e63c5a8/*
    path: input_data/case_study1/t1

reference:
  url: https://drive.google.com/file/d/1Xq8H_qmPx-hS15ChciCXi12-EAdyKIhr/view?usp=sharing
jithenece commented 1 month ago

This looks like the most-complex workflow added to MHub yet, I would love to hear about your experience in breaking up your work into MHub-IO Modules ;)

Some comments below, some apply to all Modules to simplify them a bit by making them more specific. Great work!!

Thank you for your feedback! It’s great to hear your thoughts on the complexity and structure of this model. Breaking it down into MHub-IO Modules was indeed a thoughtful process. Initially, we had a single module covering all steps, but we found it was difficult to understand.

I will review your suggestions and make the necessary changes.

fedorov commented 1 month ago

idc_version: "Data Release 2.0 October 24, 2022"

@jithenece there was no data release 2.0 on that date in IDC. IDC v2 was in June 2021, see https://learn.canceridc.dev/data/data-release-notes#v2-june-2021. Where did you get this date from?

jithenece commented 1 month ago
1.3.6.1.4.1.14519.5.2.1.72111832425535404540752357374191117693
    aws_url: s3://idc-open-data/81967c43-b804-410a-91f0-619cea9e74d6/*

Apologies. I have used collection version number for UPENN-GBM instead of idc version. I will update it

jithenece commented 1 month ago

/test

uploading the segmentation file output.zip

sample:
  idc_version: 10
  data:
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.339743325012852358051228621010996392201
    aws_url: s3://idc-open-data/7facf869-55b9-441f-81ef-a3ec2e63c5a8/*
    path: case1/t1
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.179890292978753819584197741746287159209
    aws_url: s3://idc-open-data/92492275-9496-4049-8273-4c6461d75fc9/*
    path: case1/t2
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.337491152254065288399657726162931889194
    aws_url: s3://idc-open-data/951a4b1e-1ed3-4c59-b7d0-0e877b370b03/*
    path: case1/t1ce
  - SeriesInstanceUID: 1.3.6.1.4.1.14519.5.2.1.72111832425535404540752357374191117693
    aws_url: s3://idc-open-data/81967c43-b804-410a-91f0-619cea9e74d6/*
    path: case1/flair

reference:
  url: https://github.com/user-attachments/files/16783893/output.zip

Test Results (24.08.29_15.15.30_AkoraywFNq) ```yaml id: 773edbf6-156a-410f-85f9-2f19225a4c46 date: '2024-08-29 15:54:09' missing_files: - case_study1/flair.seg.dcm - case_study1/t1ce.seg.dcm - case_study1/t2.seg.dcm - case_study1/t1.seg.dcm summary: files_missing: 4 files_extra: 0 checks: {} conclusion: false ```
LennyN95 commented 3 weeks ago

Please note, we updated our base image. All mhub dependencies are now installed in a virtual environment under /app/.venv running Python 3.11. Python, virtual environment and dependencies are now managed with uv. If required, you can create custom virtual environments, e.g., uv venv -p 3.8 .venv38 and use uv pip install -p .venv38 packge-name to install dependencies and uv run -p .venv3.8 python script.py to run a python script.

We also simplified our test routine. Sample and reference data now have to be uploaded to Zenodo and provided in a mhub.tom file at the project root. The process how to create and provide these sample data is explained in the updated testing phase article of our documentation. Under doi.org/10.5281/zenodo.13785615 we provide sample data as a reference.

jithenece commented 2 weeks ago

@LennyN95 I have updated. Please let me know if any changes required.

LennyN95 commented 1 week ago

Static Badge

Test Results ```yaml id: 651fbbe3-07a2-419b-813b-046de5412898 name: MHub Test Report (default) date: '2024-10-09 12:31:58' missing_files: - case1/flair.seg.dcm - case1/t1ce.seg.dcm - case1/t2.seg.dcm - case1/t1.seg.dcm summary: files_missing: 4 files_extra: 0 checks: {} conclusion: false ```
jithenece commented 1 week ago

Thanks @LennyN95 for testing this out. Is it possible to share the docker logs as it seems to ran fine for me by following the steps mentioned in the documentation

I had one similar issue while testing locally where docker ran fine but output files were not in the output mounted path. I found there was access issue to write file and had to run chmod 777 -R on the output mounted folder post which it got resolved. Could you try this or share the docker logs please.

LennyN95 commented 1 week ago

It appears that this is caused by the same error as reporter in https://github.com/MHubAI/models/pull/92#issuecomment-2401809168. I attached a complete run log (run on and after downloading the provided sample data). Can it be that you're running a different version of your NNUnetRunnerV2 module maybe? mhub.run.log

jithenece commented 1 week ago

We used --ipc=host to the docker command to fix this error. Could you try this please?

LennyN95 commented 1 week ago

Why would you require this? The MHub model is meant to run entirely shielded from any host processes. Our testing procedure does not allow any breach from the default setup (similar with our offline requirement). Can you elaborate why exactly you specify this flag?

jithenece commented 1 week ago

error logs show that raise RuntimeError('Background workers died. Look for the error message further up! If there is ' RuntimeError: Background workers died. Look for the error message further up! If there is none then your RAM was full and the worker was killed by the OS. Use fewer workers or get more RAM in that case!

The resource needs to be increased for running NNUnetRunnerV2. Hence adding this --ipc=host flag, it provides more resources and completes the execution.

LennyN95 commented 1 week ago

This seems to be a problem with NNUnet (see https://github.com/MIC-DKFZ/nnUNet/issues/2452). Can you please open an issue in the NNUnet repository?

I understand that setting --ipc=host seems to be a simple solution. However, for MHub we need to a) make sure that all models run with the same instruction set to keep it simple, b) make sure that all MHub containers are shielded from host processes and online services to address security concerns. For future submissions, it's best if you let us know in advance so we can find a solution in the early stages.

jithenece commented 1 week ago

The issue was already raised. They referred this

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.

LennyN95 commented 1 week ago

Thx @jithenece for sharing. I wonder if multithread data loaders are required since we implement sequential execution (in contrast to batch execution) in MHub?

jithenece commented 6 days ago

I tried to avoid multithread data loaders by reducing nnUNet_def_n_proc and nnUNet_n_proc_DA settings available at nnunetv2/configuration.py. But it does not seem to overcome this.

Alternatively,

  1. It worked when I tried using other flag setting --shm-size=256MB as stated earlier above.
  2. We used the below settings in our kubernetes cluster to fix this error. Please check if this helps Mounting an emptyDir to /dev/shm and setting the medium to Memory did the trick!