Closed fedorov closed 1 month ago
@fedorov Are we dropping s5cmd entirely? I only ask because we had to put in logic to detect the user's choice of Google or AWS . The segment changed in the PR is the default setting for the modal, but once it's launched JS detects user choices and adjusts the text to what it needs to be to reflect that (Study vs. Series, AWS vs. Google)
Hi @s-paquette - idc-index is a wrapper that simplifies the installation and download process and uses s5cmd and the buckets under the hood. I don't think the webapp needs the google/aws logic anymore.
Yes, exactly. I suggest we completely switch to idc-index. The download documentation page can go into the details of how to download from GCS, but I do not see any need to go into those details in the portal, we should use every opportunity to simplify.
I suggest the following:
manifest-based download should say "To obtain these images, install idc-index
python package with pip install --upgrade idc-index
, download the manifest and run the following command: idc download <manifest_file>
study-level download: "To obtain these images, install idc-index
python package with pip install --upgrade idc-index
, download the manifest and run the following command: idc download <StudyInstanceUID>
series-level download: "To obtain these images, install idc-index
python package with pip install --upgrade idc-index
, download the manifest and run the following command: idc download <SeriesInstanceUID>
I think we do not need manifests for study-level anymore. @ulrikew @pieper let me know if you have concerns.
Going forward, we should also add download buttons at the collection- (in the explore interface but also in the collections list) and patient-level.
@pieper the only reservation I have is that with s5cmd
user could download a single binary to download, without having to install python. idc-index
requires python to be set up first. But I think the way to deal with this is to explain in the documentation the option to use s5cmd
with the manifest. I think that it is fair to expect that typical researcher would use python.
@pieper the only reservation I have is that with
s5cmd
user could download a single binary to download, without having to install python
In this day and age I think asking people to have a minimal working python installed on their machine is a reasonable expectation for a person who wants to do cancer imaging research. I suppose if they are really unable to use a command line they can use the SlicerIDCBrowser for a fully GUI experience.
On the other hand, it would be very convenient to be able to get a zip file at the study level directly from the portal. We could limit this option to some reasonable level, like 1GB of data or something. Using a manifest to download a 50 slice series seems really inconvenient.
@fedorov @pieper Should we then change the name of the tab in the export manifest dialog to read 'idc-index' and not s5cmd? Additionally, right now a manifest file's content, when using the manifest export dialog, is s5cmd commands. Can idc-index use a file of that format, or does it expect something else?
name of the tab in the export manifest dialog to read 'idc-index' and not s5cmd?
I would say yes.
Can idc-index use a file of that format, or does it expect something else?
I think it uses the same but Andrey can say for sure.
I don't think we should change the name of the tab. We should be very conservative about changing the manifest format to reduce user suffering. The manifest itself does not need to be changed. It can still be used with s5cmd, same as before. It is just that with idc-index you can do more (get progress reporting, sort files into folder hierarchy, restart download and do sync).
On the other hand, it would be very convenient to be able to get a zip file at the study level directly from the portal. We could limit this option to some reasonable level, like 1GB of data or something.
Yes, we discussed and agreed this can and should be done, see https://github.com/ImagingDataCommons/IDC-WebApp/issues/1186#issuecomment-1654161202.
Using a manifest to download a 50 slice series seems really inconvenient.
With idc-index, this would no longer be required. We can pre-populate download command with idc-index and it will be the same idc download
+
Yes, we discussed and agreed this can and should be done, see #1186 (comment).
Thanks for the reminder - I'd forgotten about that thread but I still agree with what I said back then - the download and zipping at the study level would be completely doable client side in the browser and would make life much easier for users.
That said, using idc-index at the study / series level is also fine, and it means we have one simple method that can be used for big and small downloads so getting that implemented seems like a good approach.
@fedorov @pieper So, we're confirming idc-index can use the s5cmd formatted manifests produced by the 'Download Images' button? And, does it need to be AWS or can it be GCS as well (thus we would leave the AWS/GCS option in place) or does it need to only be AWS?
So, we're confirming idc-index can use the s5cmd formatted manifests produced by the 'Download Images' button?
yes
And, does it need to be AWS or can it be GCS as well (thus we would leave the AWS/GCS option in place) or does it need to only be AWS?
It can be either GCS or AWS.
Another benefit of using idc-index is that we no longer need to specify the endpoint, which even I can never remember by heart. it is always as simple as "idc download" + [manifest file or series_uid or study_uid or patient_id or collection_id]. Behind the scenes, when the input is a manifest, crdc_series_uuid
is parsed from the manifest, that that is all that's needed to get everything else in line.
@pgundluru @fedorov If you have a look at dev and/or test, they should both have the changes you requested above. I adjusted the width of the dialog slightly so the Study and Series UIDs don't wrap the command line (about 75px wider if the window is > 768px, under that it has to wrap due to not enough space to show the whole dialog).
@s-paquette can you please update the various download instructions as discussed in https://github.com/ImagingDataCommons/IDC-WebApp/issues/1411#issuecomment-2329097280 ? Thank you
@fedorov Ah I see what happened, I was seeing your first version of the comment, didn't notice it had been edited. I don't get pings for edits on comments, so if a change is made you'll need to let me know.
Interesting. I think it is best to review the issue and not rely on the email updates. Sending email updates manually after I edit the comment is not feasible for me.
Also note that I edited my comment within minutes of the initial post. It's not like I went back to edit it after several days.
@fedorov Understood, but I had already opened the ticket and left it open while working on it, and it never updated visually for me (I don't know if there's an auto-refresh on GitHub for edited tickets, though, there might not be).
@pgundluru This change is now on test and dev.
There is a newline added prior to the copy to clipboard icon, can it be removed?
I would prefer to remove parentheses around the copy to clipboard button - we do not have those in other places where that button is present, and I do not see value in keeping them
The width of the download popup appears to be fixed and independent from the width of the parent window, UID line wraps around. Can this be fixed?
@fedorov #1 and 2: easily
For #3, I widened it a bit already, and can widen it more, but the issue is we won't be able to account for eg. different zoom levels on people's browsers, OS font sizes, etc. We could do a quick BQ query to make sure we know what the longest UID is by character count and go by that to get a rough idea. That said I can enlarge this one a bit more, though as 768px is the minimum viewspace we can't go over that unless we're willing to scroll the browser view.
We could do a quick BQ query to make sure we know what the longest UID is by character count and go by that to get a rough idea.
The maximum number of characters is fixed to 64 by the DICOM standard for that value representation!
@fedorov Does that include the periods? And, if we wind up with some that wrap, is that going to be acceptable?
Yes, 64 total is for any character from this lexicon https://dicom.nema.org/medical/dicom/current/output/chtml/part05/sect_6.2.html:
As long as that wrap and copy to clipboard does not have any extra newline characters, it's acceptable.
As an example of what I mean, this study doesn't wrap on my monitor:
So while I can definitely widen it, there'll be some variations we can't account for.
Edit: The copy to clipboard text is independent of the actual display, so it's unaffected by any display adjustments due to the browser's setup.
Ok, sounds good.
The copy to clipboard icons are still showing up on separate lines though, as well as parentheses.
Also, just noticed this, the text says "download the manifest", which is not applicable for the study download. Instead, the text should say "Then run the following command:".
Upon further review and testing, I realized that listing "s5cmd" for the download popup at study/series level makes no sense, since there is no manifest involved that could be used with s5cmd. In a discussion with @s-paquette, it became clear that having "s5cmd" in the manifest-level download tab vs "idc-index" in the study/series level is too much work. Thus, we decided to replace "s5cmd" with "idc-index" in all places, which should make ui less confusing, hopefully.
As detected by @pgundluru the highlighted text should be removed
@fedorov @pgundluru This is now ready for testing
The download instructions do not match what I contributed in this PR:
https://github.com/ImagingDataCommons/IDC-WebApp/pull/1399/files
Specifically, my change had this:
and the download screen shows this in the test tier:
Also, since this will need to be fixed anyway, please also add a sentence "To learn about alternative ways to download IDC data, please see IDC download instructions page".