Closed vkt1414 closed 9 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
@vkt1414 is it ready for review? I am not sure, since it is marked as review requested, but also it is a draft.
@vkt1414 is it ready for review? I am not sure, since it is marked as review requested, but also it is a draft.
Sorry. It''s not ready yet. Initially I was planning to submit multiple PRs but that would mean accessing different things from different branches. So I'm consolidating everything in this PR. Things remanining:
@fedorov this PR is now ready for review.
The only thing pending is updating CWL files with new starting point i.e list of SeriesInstanceUIDs. I'm yet to get them work on SB-CGC. I also realized there are better ways to write CWL files that may enhance readability. I'll keep working on them and should be able to figure out by preprint. Otherwise, I'm ready to launch any sized cohorts on Terra once this PR is found satisfactory and merged.
The readme will also continue to evolve as I would like to provide screenshots for each step, after this PR
Address #40, #44, #45 and #46
[x] Instead of commenting out installation of prerequisites, it is better to put those into an if clause, which would check whether papermill-initialized variable (say, "interactive") is reset
using
'google.colab' in sys.modules
thanks for the tip![x] if the notebook is executed in the "colab" (interactive) mode, it would be good to make it possible to run it without assuming that manifest file is present, and instead generate it on the fly, or use pre-defined SeriesInstanceUIDs, so that the notebook is completely self-contained
[x] I noticed that in some cases global variables are used - it would be good to get rid of those and instead use function parameters for passing these variables
[x] things such as SeriesNumber may be easiest to fetch from the input CT series instead of relying on the manifest or query
using idc-index wherever possible. i.e to know the modality and series number for now
The notebooks now have a function to check the modality before downloading the files. It will skip and create an error file.
The manifests are simplified and Terra users can now simply refer the SeriesInstanceUIDs directly from the data table. SB- CGC users will continue to have the csv manifests but the papermill command will be the same for both Terra and SB-CGC
[x] instead of pulling the latest docker images, we should use SHA, and ideally tags, to ensure the same version of the image is used
Using tags main for main branch and whenever tag is push, gha will publish the corresponding tag to docker images as well
[x] workflows currently refer to the notebooks from "main" branch; instead we should use git tags and refer to tagged notebooks
A python script is available in workflows/TotalSegmentator/resources to handle renaming URLs across the repo
[x] parameterize papermill with the variable to pass the tag, and refer to the tagged versions of the artifacts from the notebook
Since we are going to rewrite URLs with the tag, I believe this is no longer necessary
[x] remove all branches from the repository so that the users of Dockstore are not confused
[x] before running the analysis, create a new release/tag, and use the specific Dockstore workflow releases for the run