End-to-end Training Pipeline

joeranbosma commented 1 year ago

In order to run the training pipeline on SageMaker for the PI-CAI Closed Testing Phase with the Public and Private Training dataset, the full end-to-end pipeline needs to be specified a priori in scripts.

Based on your repository, it appears you do the following steps, but I may be mistaken:

Preprocess data for classification model
Preprocess data for segmentation model
Train classification model (supervised)
Train segmentation model (supervised)
Generate pseudo labels with classification and segmentation model
Train segmentation model (semi-supervised)

Could you confirm this, or let me know the correct sequence?

If correct, it means that the current preprocessing script only prepares the data for the segmentation model, right? If so, the preprocessing script should be extended to also preprocess & export the data for the classification model, essentially including this preprocessing script.

For the training steps, there should be a script that handles the export of trained model weights, as well as any other steps that are now performed manually (including setting paths). For the steps above, I've drafted four training scripts in this PR:

The files are merely templates, the training code needs to be added still! The same PR also replaces the picai_eval folder by pip-installing picai_eval.

Pending:

[x] Verify preprocessing pipeline
[x] Verify pseudo label generation
[x] Verify training pipeline (supervised)
[x] Verify training pipeline (semi-supervised)
[x] Verify training Docker container

Yukiya-Umimi commented 1 year ago

Hi,

This sequence is ok. The current preprocessing script does not only prepare the data for the segmentation model, but also for the classification model. The last question is a bit complicated. We will retain the three possible optimal weights during the training process, and then the trainer needs to choose the weights he thinks are the best. This selection process may require the prior knowledge of the trainer, such as other reserved parameters during the training process or the prediction results in the validation set, which is difficult to write into a program to run.

Best, Honey K

	yukiya

@. | ---- Replied Message ---- | From | Joeran @.> | | Date | 2/26/2023 23:47 | | To | @.> | | Cc | @.> | | Subject | [Yukiya-Umimi/ITUNet-for-PICAI-2022-Challenge] End-to-end Training Pipeline (Issue #6) |

In order to run the training pipeline on SageMaker for the PI-CAI Closed Testing Phase with the Public and Private Training dataset, the full end-to-end pipeline needs to be specified a priori in scripts.

Based on your repository, it appears you do the following steps, but I may be mistaken:

Preprocess data for classification model Preprocess data for segmentation model Train classification model (supervised) Train segmentation model (supervised) Generate pseudo labels with classification and segmentation model Train segmentation model (semi-supervised)

Could you confirm this, or let me know the correct sequence?

If correct, it means that the current preprocessing script only prepares the data for the segmentation model, right? If so, the preprocessing script should be extended to also preprocess & export the data for the classification model.

For the training steps, there should be a script that handles the export of trained model weights, as well as any other steps that are now performed manually (including setting paths). For the steps above, I've drafted four training scripts in this PR:

Train classification model (supervised) Train segmentation model (supervised) Generate pseudo labels Train semi-supervised model

The files are merely templates, the training code needs to be added still!

Pending:

Verify preprocessing pipeline Verify pseudo label generation Verify training pipeline (supervised) Verify training pipeline (semi-supervised) Verify training Docker container

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Yukiya-Umimi commented 1 year ago

By the way, we don't want to change our code into another form, because we can't guarantee that there will be no errors during this period, which requires a lot of work to verify. We have tested that the current code is executable, and the parts requiring manual operation are very simple. If you have the time and ability to complete this work, you can create a new branch to use your code, and we will also thank you.

	yukiya