ImagingDataCommons / IDC-WebApp

Web Application front end for IDC (CORE REPO)
Apache License 2.0
6 stars 2 forks source link

Download instructions are not correct #1411

Closed fedorov closed 1 month ago

fedorov commented 2 months ago

The download instructions do not match what I contributed in this PR:

https://github.com/ImagingDataCommons/IDC-WebApp/pull/1399/files

Specifically, my change had this:

image

and the download screen shows this in the test tier:

image

Also, since this will need to be fixed anyway, please also add a sentence "To learn about alternative ways to download IDC data, please see IDC download instructions page".

s-paquette commented 2 months ago

@fedorov Are we dropping s5cmd entirely? I only ask because we had to put in logic to detect the user's choice of Google or AWS . The segment changed in the PR is the default setting for the modal, but once it's launched JS detects user choices and adjusts the text to what it needs to be to reflect that (Study vs. Series, AWS vs. Google)

pieper commented 2 months ago

Hi @s-paquette - idc-index is a wrapper that simplifies the installation and download process and uses s5cmd and the buckets under the hood. I don't think the webapp needs the google/aws logic anymore.

fedorov commented 2 months ago

Yes, exactly. I suggest we completely switch to idc-index. The download documentation page can go into the details of how to download from GCS, but I do not see any need to go into those details in the portal, we should use every opportunity to simplify.

I suggest the following:

I think we do not need manifests for study-level anymore. @ulrikew @pieper let me know if you have concerns.

Going forward, we should also add download buttons at the collection- (in the explore interface but also in the collections list) and patient-level.

@pieper the only reservation I have is that with s5cmd user could download a single binary to download, without having to install python. idc-index requires python to be set up first. But I think the way to deal with this is to explain in the documentation the option to use s5cmd with the manifest. I think that it is fair to expect that typical researcher would use python.

pieper commented 2 months ago

@pieper the only reservation I have is that with s5cmd user could download a single binary to download, without having to install python

In this day and age I think asking people to have a minimal working python installed on their machine is a reasonable expectation for a person who wants to do cancer imaging research. I suppose if they are really unable to use a command line they can use the SlicerIDCBrowser for a fully GUI experience.

On the other hand, it would be very convenient to be able to get a zip file at the study level directly from the portal. We could limit this option to some reasonable level, like 1GB of data or something. Using a manifest to download a 50 slice series seems really inconvenient.

s-paquette commented 2 months ago

@fedorov @pieper Should we then change the name of the tab in the export manifest dialog to read 'idc-index' and not s5cmd? Additionally, right now a manifest file's content, when using the manifest export dialog, is s5cmd commands. Can idc-index use a file of that format, or does it expect something else?

pieper commented 2 months ago

name of the tab in the export manifest dialog to read 'idc-index' and not s5cmd?

I would say yes.

Can idc-index use a file of that format, or does it expect something else?

I think it uses the same but Andrey can say for sure.

fedorov commented 2 months ago

I don't think we should change the name of the tab. We should be very conservative about changing the manifest format to reduce user suffering. The manifest itself does not need to be changed. It can still be used with s5cmd, same as before. It is just that with idc-index you can do more (get progress reporting, sort files into folder hierarchy, restart download and do sync).

fedorov commented 2 months ago

On the other hand, it would be very convenient to be able to get a zip file at the study level directly from the portal. We could limit this option to some reasonable level, like 1GB of data or something.

Yes, we discussed and agreed this can and should be done, see https://github.com/ImagingDataCommons/IDC-WebApp/issues/1186#issuecomment-1654161202.

Using a manifest to download a 50 slice series seems really inconvenient.

With idc-index, this would no longer be required. We can pre-populate download command with idc-index and it will be the same idc download + . So we don't even need to wait for that tar/zip functionality to be implemented server-side.

pieper commented 2 months ago

Yes, we discussed and agreed this can and should be done, see #1186 (comment).

Thanks for the reminder - I'd forgotten about that thread but I still agree with what I said back then - the download and zipping at the study level would be completely doable client side in the browser and would make life much easier for users.

That said, using idc-index at the study / series level is also fine, and it means we have one simple method that can be used for big and small downloads so getting that implemented seems like a good approach.

s-paquette commented 2 months ago

@fedorov @pieper So, we're confirming idc-index can use the s5cmd formatted manifests produced by the 'Download Images' button? And, does it need to be AWS or can it be GCS as well (thus we would leave the AWS/GCS option in place) or does it need to only be AWS?

fedorov commented 2 months ago

So, we're confirming idc-index can use the s5cmd formatted manifests produced by the 'Download Images' button?

yes

And, does it need to be AWS or can it be GCS as well (thus we would leave the AWS/GCS option in place) or does it need to only be AWS?

It can be either GCS or AWS.

Another benefit of using idc-index is that we no longer need to specify the endpoint, which even I can never remember by heart. it is always as simple as "idc download" + [manifest file or series_uid or study_uid or patient_id or collection_id]. Behind the scenes, when the input is a manifest, crdc_series_uuid is parsed from the manifest, that that is all that's needed to get everything else in line.

s-paquette commented 2 months ago

@pgundluru @fedorov If you have a look at dev and/or test, they should both have the changes you requested above. I adjusted the width of the dialog slightly so the Study and Series UIDs don't wrap the command line (about 75px wider if the window is > 768px, under that it has to wrap due to not enough space to show the whole dialog).

fedorov commented 2 months ago

@s-paquette can you please update the various download instructions as discussed in https://github.com/ImagingDataCommons/IDC-WebApp/issues/1411#issuecomment-2329097280 ? Thank you

s-paquette commented 2 months ago

@fedorov Ah I see what happened, I was seeing your first version of the comment, didn't notice it had been edited. I don't get pings for edits on comments, so if a change is made you'll need to let me know.

fedorov commented 2 months ago

Interesting. I think it is best to review the issue and not rely on the email updates. Sending email updates manually after I edit the comment is not feasible for me.

fedorov commented 2 months ago

Also note that I edited my comment within minutes of the initial post. It's not like I went back to edit it after several days.

s-paquette commented 2 months ago

@fedorov Understood, but I had already opened the ticket and left it open while working on it, and it never updated visually for me (I don't know if there's an auto-refresh on GitHub for edited tickets, though, there might not be).

@pgundluru This change is now on test and dev.

fedorov commented 2 months ago
  1. There is a newline added prior to the copy to clipboard icon, can it be removed?

    image
  2. I would prefer to remove parentheses around the copy to clipboard button - we do not have those in other places where that button is present, and I do not see value in keeping them

  3. The width of the download popup appears to be fixed and independent from the width of the parent window, UID line wraps around. Can this be fixed?

2024-09-11_10-10-20

s-paquette commented 2 months ago

@fedorov #1 and 2: easily

For #3, I widened it a bit already, and can widen it more, but the issue is we won't be able to account for eg. different zoom levels on people's browsers, OS font sizes, etc. We could do a quick BQ query to make sure we know what the longest UID is by character count and go by that to get a rough idea. That said I can enlarge this one a bit more, though as 768px is the minimum viewspace we can't go over that unless we're willing to scroll the browser view.

fedorov commented 2 months ago

We could do a quick BQ query to make sure we know what the longest UID is by character count and go by that to get a rough idea.

The maximum number of characters is fixed to 64 by the DICOM standard for that value representation!

s-paquette commented 2 months ago

@fedorov Does that include the periods? And, if we wind up with some that wrap, is that going to be acceptable?

fedorov commented 2 months ago

Yes, 64 total is for any character from this lexicon https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_6.2.html:

image

As long as that wrap and copy to clipboard does not have any extra newline characters, it's acceptable.

s-paquette commented 2 months ago

As an example of what I mean, this study doesn't wrap on my monitor:

image

So while I can definitely widen it, there'll be some variations we can't account for.

Edit: The copy to clipboard text is independent of the actual display, so it's unaffected by any display adjustments due to the browser's setup.

fedorov commented 2 months ago

Ok, sounds good.

The copy to clipboard icons are still showing up on separate lines though, as well as parentheses.

fedorov commented 2 months ago

Also, just noticed this, the text says "download the manifest", which is not applicable for the study download. Instead, the text should say "Then run the following command:".

fedorov commented 2 months ago

Upon further review and testing, I realized that listing "s5cmd" for the download popup at study/series level makes no sense, since there is no manifest involved that could be used with s5cmd. In a discussion with @s-paquette, it became clear that having "s5cmd" in the manifest-level download tab vs "idc-index" in the study/series level is too much work. Thus, we decided to replace "s5cmd" with "idc-index" in all places, which should make ui less confusing, hopefully.

fedorov commented 2 months ago

As detected by @pgundluru the highlighted text should be removed

image
s-paquette commented 2 months ago

@fedorov @pgundluru This is now ready for testing