Preprocessing pipeline - Githubissues

joeranbosma commented 1 year ago

Hi,

Thanks for the efforts to put all scripts into this repository! We believe the repository is near completion for usage in the PI-CAI Closed Testing Phase. Before executing the next step, we would like to address the following points:

[x] The location of the input and output data must be configurable.
[x] The full training pipeline must be a single script, to ensure everything is run as intended
[x] The preprocessed resources need to be exported to a configurable folder.
[x] For maintainability, it would be best not to modify the PI-CAI tools, but rather use a specific version and set the preprocessing settings.

Ideally, the input directories would be read from the environment variables SM_CHANNEL_IMAGES and SM_CHANNEL_LABELS and the output directory would be read from the environment variable SM_MODEL_DIR. For convenience in future endeavors outside SageMaker, it would be best if these paths can also be set through command line options. argparse provides a setup that can achieve this:

parser.add_argument('--imagesdir', type=str, default=os.environ.get('SM_CHANNEL_IMAGES', "/input/images"))

If I understand correctly, the preprocessing pipeline now consists of three scripts: make_json.py,make_dataset.pyandprecess_testdata.py`. These need to be combined into a single script to ensure consistency between the scripts, and a single flow from input data to the preprocessed dataset.

The repository now contains a modified version of picai_prep, which makes it difficult for users to understand what was changed exactly. We would like to propose to use a fixed version of picai_prep (e.g., the newest version, 2.1.2). Were any changes made to picai_prep that required the picai_prep repository to be copied into this repository?

I have made suggestions to address the points above here: https://github.com/Yukiya-Umimi/ITUNet-for-PICAI-2022-Challenge/pull/2. Could you go over these points, and if you agree with the changes, confirm whether the preprocess.py script contains all preprocessing steps in the exact same way as intended?

Looking forward to your response, Kind regards, Joeran

Yukiya-Umimi commented 1 year ago

Hi,

I don't quite understand what you mean, but if you need to make any changes, you can make them directly.

Best, Honey K

	yukiya

@. | ---- Replied Message ---- | From | Joeran @.> | | Date | 2/15/2023 00:02 | | To | @.> | | Cc | @.> | | Subject | [Yukiya-Umimi/ITUNet-for-PICAI-2022-Challenge] Preprocessing pipeline (Issue #3) |

Hi,

Thanks for the efforts to put all scripts into this repository! We believe the repository is near completion for usage in the PI-CAI Closed Testing Phase. Before executing the next step, we would like to address the following points:

The location of the input and output data must be configurable. The full training pipeline must be a single script, to ensure everything is run as intended The preprocessed resources need to be exported to a configurable folder. For maintainability, it would be best not to modify the PI-CAI tools, but rather use a specific version and set the preprocessing settings.

Ideally, the input directories would be read from the environment variables SM_CHANNEL_IMAGES and SM_CHANNEL_LABELS and the output directory would be read from the environment variable SM_MODEL_DIR. For convenience in future endeavors outside SageMaker, it would be best if these paths can also be set through command line options. argparse provides a setup that can achieve this:

parser.add_argument('--imagesdir', type=str, default=os.environ.get('SM_CHANNEL_IMAGES', "/input/images"))

If I understand correctly, the preprocessing pipeline now consists of three scripts: make_json.py, make_dataset.pyandprecess_testdata.py`. These need to be combined into a single script to ensure consistency between the scripts, and a single flow from input data to the preprocessed dataset.

The repository now contains a modified version of picai_prep, which makes it difficult for users to understand what was changed exactly. We would like to propose to use a fixed version of picai_prep (e.g., the newest version, 2.1.2). Were any changes made to picai_prep that required the picai_prep repository to be copied into this repository?

I have made suggestions to address the points above here: #2. Could you go over these points, and if you agree with the changes, confirm whether the preprocess.py script contains all preprocessing steps in the exact same way as intended?

Looking forward to your response, Kind regards, Joeran

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

joeranbosma commented 1 year ago

Hi @Yukiya-Umimi,

Please allow me to explain it better.

I need to confirm the following with you: did you change anything in the ITUNet-for-PICAI-2022-Challenge/preprocess/src/picai_prep folder?

If this is a modified version of the picai_prep repository, could you indicate which changes you made?

If this is an exact copy of the picai_prep repository, could we switch to pip install picai_prep and use the "official" version instead?

Thanks in advance, Joeran

Yukiya-Umimi commented 1 year ago

Hi,

I didn't change anything in it. I just copied them directly for the convenience of calling. I don't know what is the difference between the newest version and the original one. If you haven't changed the functions in it, I think it should be possible, but you need to change the corresponding function calls in other files.

Best, Honey K		yukiya

@. | ---- Replied Message ---- | From | Joeran @.> | | Date | 2/15/2023 16:28 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [Yukiya-Umimi/ITUNet-for-PICAI-2022-Challenge] Preprocessing pipeline (Issue #3) |

Hi @Yukiya-Umimi,

Please allow me to explain it better.

I need to confirm the following with you: did you change anything in the ITUNet-for-PICAI-2022-Challenge/preprocess/src/picai_prep folder?

If this is a modified version of the picai_prep repository, could you indicate which changes you made?

If this is an exact copy of the picai_prep repository, could we switch to pip install picai_prep and use the "official" version instead?

Thanks in advance, Joeran

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

joeranbosma commented 1 year ago

Great! Then let's change to the installable picai_prep module, this will be easier to maintain in the future.

I have made my proposed changes here: https://github.com/Yukiya-Umimi/ITUNet-for-PICAI-2022-Challenge/pull/2. In this PR, I have switched to using the installable picai_prep module, and combined the three separate preprocessing scripts into a single one: preprocess.py.

After running preprocess.py, the following data will be exported and available during training:

The labeled dataset with manual annotations, at output_dir / "nnUNet_raw_data".
The full dataset without any labels, at output_dir / "nnUNet_test_data".

To confirm, the current version of the script will not export the following:

The mha2nnunet_settings JSON file
Any AI-derived pseudo labels

Could you confirm this is correct, and contains all resources needed to train your algorithm?

Yukiya-Umimi commented 1 year ago

Hi,

Yes, all pseudo labels will be generated in the segmentation and classification code.

Best, Honey K

	yukiya

@. | ---- Replied Message ---- | From | Joeran @.> | | Date | 2/15/2023 17:33 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [Yukiya-Umimi/ITUNet-for-PICAI-2022-Challenge] Preprocessing pipeline (Issue #3) |

Great! Then let's change to the installable picai_prep module, this will be easier to maintain in the future.

I have made my proposed changes here: #2. In this PR, I have switched to using the installable picai_prep module, and combined the three separate preprocessing scripts into a single one: preprocess.py.

After running preprocess.py, the following data will be exported and available during training:

The labeled dataset with manual annotations, at output_dir / "nnUNet_raw_data". The full dataset without any labels, at output_dir / "nnUNet_test_data".

To confirm, the current version of the script will not export the following:

The mha2nnunet_settings JSON file Any AI-derived pseudo labels

Could you confirm this is correct, and contains all resources needed to train your algorithm?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

joeranbosma commented 1 year ago

Perfect, thanks for confirming! Then you can go ahead and merge https://github.com/Yukiya-Umimi/ITUNet-for-PICAI-2022-Challenge/pull/2 into your repository, given that you agree with the proposed changes. I have cleaned up the three independent preprocessing scripts (as those are now all part of preprocess.py), but this is not a requirement for your repository of course. Please feel free to change anything in the proposed update!

joeranbosma commented 1 year ago

Hi @Yukiya-Umimi,

Did you have the time to look at #2 and merge it in if you agree with the proposed changes? Please let me know if anything remains unclear.

Thanks for your time, Joeran.

Yukiya-Umimi commented 1 year ago

Hi,

I'm not familiar with the operation of github. I don't know how to operate it. Can you merge the code?

Best, Honey K

	yukiya

@. | ---- Replied Message ---- | From | Joeran @.> | | Date | 2/26/2023 21:14 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [Yukiya-Umimi/ITUNet-for-PICAI-2022-Challenge] Preprocessing pipeline (Issue #3) |

Hi @Yukiya-Umimi,

Did you have the time to look at #2 and merge it in if you agree with the proposed changes? Please let me know if anything remains unclear.

Thanks for your time, Joeran.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

joeranbosma commented 1 year ago

Hi @Yukiya-Umimi ,

You can see the pull request if you go to this link: https://github.com/Yukiya-Umimi/ITUNet-for-PICAI-2022-Challenge/pull/2.

There you can see all edited files under the tab "Files changed". In that tab, you can check each file. If anything is incorrect, you can leave a comment. If everything is correct, you can approve the pull request. If all changes are correct, the changes can be merged in under the tab "Conversation" by clicking "Merge pull request".

Hope this helps.

Yukiya-Umimi commented 1 year ago

Hi,

I have tried the operation. I see "Pull request successfully merged and closed". It should be successful.

Best, Honey K

	yukiya

@. | ---- Replied Message ---- | From | @.> | | Date | 2/26/2023 21:30 | | To | @.**@.> | | Subject | Re: [Yukiya-Umimi/ITUNet-for-PICAI-2022-Challenge] Preprocessing pipeline (Issue #3) | Hi,

I'm not familiar with the operation of github. I don't know how to operate it. Can you merge the code?

Best, Honey K

	yukiya

@. | ---- Replied Message ---- | From | Joeran @.> | | Date | 2/26/2023 21:14 | | To | @.> | | Cc | @.> , @.***> | | Subject | Re: [Yukiya-Umimi/ITUNet-for-PICAI-2022-Challenge] Preprocessing pipeline (Issue #3) |

Hi @Yukiya-Umimi,

Did you have the time to look at #2 and merge it in if you agree with the proposed changes? Please let me know if anything remains unclear.

Thanks for your time, Joeran.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

joeranbosma commented 1 year ago

Thanks!

Yukiya-Umimi / ITUNet-for-PICAI-2022-Challenge

Preprocessing pipeline #3