Project-MONAI / tutorials

MONAI Tutorials
https://monai.io/started.html
Apache License 2.0
1.74k stars 662 forks source link

The execution of `runner.sh` is timing out due to the large dataset #1728

Open ctongh opened 2 months ago

ctongh commented 2 months ago

Describe the bug

When I previously submitted a new tutorial via a PR #1696 , I encountered a timeout issue while verifying the code using runner.sh. Subsequently, I tried to verify other notebooks already in monai/tutorials (e.g., spleen_segmentation_3d.ipynb) using the same method and faced similar issues. However, the timeout problem with these notebooks is sometimes not due to the code itself but because the dataset size is huge, causing the download to take too much time. Therefore, I suggest adding a block in runner.sh to check the time taken for the notebook to download the dataset. Although runner.sh shows the line number of the exit code when a timeout occurs, the timeout issue may not necessarily be caused by that line of code.

To Reproduce Execute the following commands locally ./runner.sh -t 3d_segmentation/brats_segmentation_3d.ipynb ./runner.sh -t 3d_segmentation/spleen_segmentation_3d.ipynb

Expected behavior

There is a section in runner.sh that checks for the presence of "Setup imports" and whether the block following it is for importing various required packages for the notebook.

        # if import is used, then it should have the Setup import(s) markdown
        if [[ $(${NB_TEST} verify -f "$fname" -k "(^import|[\n\r]import|^from|[\n\r]from)" --type code) == true ]]
        then
            if [[ $(${NB_TEST} verify -f "$fname" -i $((code_ind + 1)) -k "Setup import") != true ]]; then
                print_error_msg "Missing the \"Setup imports\" after the first code cell of file: $fname"
                standardized=false
            fi

            if [[ $(${NB_TEST} verify -f "$fname" -i $((code_ind + 2)) -k "print_config()" --type code) != true ]]; then
                print_error_msg "print_config() cannot be found after the \"Setup imports\" markdown cell in file: $fname"
                standardized=false
            fi
        fi

We could use a similar method to add a block of code that checks for the presence of "Download dataset" and measures the download time for the subsequent block. With this update, we can better understand the cause of the timeout when verifying notebooks using runner.sh.