InsightSoftwareConsortium / itkwidgets

An elegant Python interface for visualization on the web platform to interactively generate insights into multidimensional images, point sets, and geometry.
https://itkwidgets.readthedocs.io/
Apache License 2.0
576 stars 83 forks source link

Fetch from idc updates #665

Closed bnmajor closed 1 year ago

bnmajor commented 1 year ago

This branch is based on #664 with the following updates:

cc @fedorov

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

fedorov commented 1 year ago

As I noted in the email response to @aylward, I am strongly against using gsutil because:

I think it is beneficial to use the same tool for downloading data from IDC in this notebook as we recommend the user should use for larger cohorts (IDC download instructions are here: https://learn.canceridc.dev/data/downloading-data).

aylward commented 1 year ago

Makes sense!

It would be great to use s5cmd. What do you recommend as a cross-platform notebook solution? I guess we could make a conditional on architecture, but that's a bit brute force. Other ideas?

s

On Wed, May 24, 2023 at 9:56 PM Andrey Fedorov @.***> wrote:

As I noted in the email response to @aylward https://github.com/aylward, I am strongly against using gsutil because:

  • s5cmd is order of magnitude faster, based on our experience (does not matter for 1 series, but makes a huge difference while working with larger cohorts)
  • s5cmd is a lot easier to install than gsutil - unpack pre-compiled binary for your platform for s5cmd vs python prerequisite (with constraints on the version) and install large Google Cloud SDK package
  • s5cmd works with both Google and AWS buckets - gsutil is Google only

I think it is beneficial to use the same tool for downloading data from IDC in this notebook as we recommend the user should use for larger cohorts (IDC download instructions are here: https://learn.canceridc.dev/data/downloading-data).

— Reply to this email directly, view it on GitHub https://github.com/InsightSoftwareConsortium/itkwidgets/pull/665#issuecomment-1562150253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACEJL4FC27GGG2SG2BVQUDXH235RANCNFSM6AAAAAAYN7VJIM . You are receiving this because you were mentioned.Message ID: @.***>

-- Stephen R. Aylward, Ph.D. Senior Director of Strategic Initiatives Chair of MONAI Advisory Board

Kitware: Delivering innovation.

bnmajor commented 1 year ago

@fedorov Thanks for the explanations! I agree, if s5cmd is the preferred (and significantly faster) route, lets stick to that! To handle multiple platforms we could approach this a couple of ways:

  1. Try to detect the platform automatically and select the correct package. This of course runs the risk of guessing incorrectly but could include a note that warns users that their platform must be specified and allow them to simply select the correct package themselves. This may add a lot of conditional cells (or one large one) or an external link that requires the user to select the correct binaries themselves.
  2. Require the user to use conda so that they can use the conda-forge package. This appears to be on v2.0.0 (which is the version that you were using, so that is good!) but is managed by the community so we cannot always guarantee it will be in sync with the latest releases. Additionally this requires a special installation step in Colab in order to use conda.

I'm not sure if there are preferences on which approach to take or maybe an option that I am overlooking but I am open to suggestions! FWIW I would prefer the conda requirement over guessing platforms (or expecting users to know). I'm happy to pull together a quick example of each if that is helpful though!

@aylward @thewtex

fedorov commented 1 year ago

I didn't know about the conda-forge package - this is cool! I also don't know about various mechanism to distribute python Perhaps it's a naive question, but what would be wrong with introducing a PyPI package that would just wrap the platform-appropriate s5cmd binary?

aylward commented 1 year ago

You're exactly right! We should help the s5cmd developers create pip packages.

They are already using github actions ( https://github.com/peak/s5cmd/tree/master/.github/workflows) to build releases, so it shouldn't be too hard to package them for pip.

Perhaps chatting with the developers to determine their interest is the first step? Do you have a relationship with them?

In the meantime, perhaps Brianna's first suggestion (using something like platform.system() to determine the OS) should be used for now in the notebook? Heck, we could even just have all three downloads and comment-out the windows and macOS downloads so that it works on Colab and others can figure out how to uncomment a line to get it to work on other platforms? It is an interactive notebook...

s

On Thu, May 25, 2023 at 9:49 AM Andrey Fedorov @.***> wrote:

I didn't know about the conda-forge package - this is cool! I also don't know about various mechanism to distribute python Perhaps it's a naive question, but what would be wrong with introducing a PyPI package that would just wrap the platform-appropriate s5cmd binary?

— Reply to this email directly, view it on GitHub https://github.com/InsightSoftwareConsortium/itkwidgets/pull/665#issuecomment-1562945470, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACEJL5XNAVLHL7SEO56RXDXH5PQBANCNFSM6AAAAAAYN7VJIM . You are receiving this because you were mentioned.Message ID: @.***>

-- Stephen R. Aylward, Ph.D. Senior Director of Strategic Initiatives Chair of MONAI Advisory Board

Kitware: Delivering innovation.

fedorov commented 1 year ago

All good point. Alas, I do not have any connection with the s5cmd developers - it is the "it just works" relationship! :-D