dddavid4real / HistGen

[MICCAI 2024] Official Repo of "HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction"
Apache License 2.0
23 stars 0 forks source link

How to download the data from the Sharepoint #6

Open JinqianPan opened 1 month ago

JinqianPan commented 1 month ago

Hi,

I am working on how to download data from Sharepoint. The official limit of Sharepoint's single zip download is 20GB, but it is obvious that this is far from enough. I wonder if there is any way to break through this limitation, so that could download data?

dddavid4real commented 1 month ago

Hi. I guess you could go into folder dinov2, select all .pt files, and try downloading. Also, you could try to find some download tools for Onedrive SharePoint.

If you already downloaded the original WSI files (with .svs extension), you could use our provided DINOv2 feature extractor to extract the features.

JinqianPan commented 1 month ago

Thank you for your reply!

The first way, as previously mentioned, involves selecting all .pt files, which is feasible only when the file size is under 20GB; alternatively, could you divide the .pt files into 17 folders, each containing up to 20GB? So that we could download 17 times for different part of data. I am still searching whether there is a tool could download the huge data from Onedrive or SharePoint. As for the final way, which involves downloading original WSI files, I am looking for a way to filter data from the GCD data portal to avoid downloading the entire 8.9PB of data.

dddavid4real commented 1 month ago

If you are downloading to your personal PC, using Onedrive Windows/Mac to synchronize the data folder rather than downloading directly should be a better solution. After that, upload the files from your personal PC to linux server.

For TCGA Data Portal, you could simply download the .svs files of TCGA program using the filters provided by the website. There are many tutorials online.

BenPashley commented 2 weeks ago

Hi and thank you for continuing help and support.

I'm trying to download the feature files as suggested but I cannot synchronize your data folder as it hasn't been explicitly shared with my one drive account. If I provide my email would you be willing to do this or do you have an alternative suggestion? I've looked at other options including scraping or even downloading manually, but strangley I'm not even able to reorder your files by file size which would make this process easier. I have downloaded the SVS files, but pre-processing them will take a considerable amount of time with my current resources.

dddavid4real commented 2 weeks ago

Hi and thank you for continuing help and support.

I'm trying to download the feature files as suggested but I cannot synchronize your data folder as it hasn't been explicitly shared with my one drive account. If I provide my email would you be willing to do this or do you have an alternative suggestion? I've looked at other options including scraping or even downloading manually, but strangley I'm not even able to reorder your files by file size which would make this process easier. I have downloaded the SVS files, but pre-processing them will take a considerable amount of time with my current resources.

Hi, yes please attach your email and I will manually include your account into the viewer list.

BenPashley commented 2 weeks ago

Many thanks although I cannot see the share under my one drive account? Have you setup correctly?

dddavid4real commented 2 weeks ago

Many thanks although I cannot see the share under my one drive account? Have you setup correctly?

That's weird. Maybe try to use this link directly: https://hkustconnect-my.sharepoint.com/:f:/g/personal/zguobc_connect_ust_hk/EhmtBBT0n2lKtiCQt97eqcEBvO9WwNM3TL9x-7-kg_liuA?e=1N4FHk

After you enter the link, choose all folders and files. After that, there is a button named "Copy to". Click that button should make it possible to copy the files to your own Onedrive.

BenPashley commented 2 weeks ago

I can't see an option to specify my account. It only allows me to copy to an existing location in your one drive?

dddavid4real commented 2 weeks ago

I can't see an option to specify my account. It only allows me to copy to an existing location in your one drive?

Is there any tutorial about syncing this folder to your Onedrive from my side?

BenPashley commented 2 weeks ago

Try this. You should be able to specify my email address as a viewer of the folder. Thanks again for helping me with this.

https://support.microsoft.com/en-gb/office/share-onedrive-files-and-folders-9fcc2f7d-de0c-4cec-93b0-a82024800c07#ID0EDBJ=Share_with_specific_people

BenPashley commented 2 weeks ago

So strange!

Screenshot 2024-08-23 at 14 09 16

Although I recevied your email I cannot see it in my one drive. The issue is the limitation on downloading the zip. Would it be at all possible to you win zip to create a multi-file zip? Say 5-10gb for each file. This will remove the issue and allow me to download the 30-50 separate files. Winzip should make it easy to do this.

BenPashley commented 2 weeks ago

10gb files seem to be ok. I'm not sure what the limitation is? I think it's around 15-20gb.

dddavid4real commented 2 weeks ago

Yeah the limitation of Onedrive can be disturbing. I will try if I can move these files to Google Drive or somewhere else. But this might take a lot of time since I didn't save these files locally on my PC.

I suggest running the feature extraction code. And to speed up for that, you could run that script multiple times. For example, you could generate a reversed csv file and then run the feature extraction code on original csv and the reversed csv, which should save you half the time.

BenPashley commented 2 weeks ago

Thanks for trying. Good suggestion. Will do.

BenPashley commented 2 weeks ago

Would you mind removing my personal email address from the comments (is possible).

dddavid4real commented 2 weeks ago

Sure thing. No worries.

JinqianPan commented 2 weeks ago

Many thanks although I cannot see the share under my one drive account? Have you setup correctly?

That's weird. Maybe try to use this link directly: https://hkustconnect-my.sharepoint.com/:f:/g/personal/zguobc_connect_ust_hk/EhmtBBT0n2lKtiCQt97eqcEBvO9WwNM3TL9x-7-kg_liuA?e=1N4FHk

After you enter the link, choose all folders and files. After that, there is a button named "Copy to". Click that button should make it possible to copy the files to your own Onedrive.

The 'Copy to' bottom might only work on your own account. It is weird that we could not copy to the files into our Onedrive.