NVIDIA / modulus

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
https://developer.nvidia.com/modulus
Apache License 2.0
797 stars 174 forks source link

🐛[BUG]: CorrDiff dataset is unreadable when downloaded via wget #431

Closed gideonite closed 2 months ago

gideonite commented 3 months ago

Version

latest

On which installation method(s) does this occur?

No response

Describe the issue

Following the link on the CorrDiff README, e.g. this link, I selected the wget option and ran the command below

wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.41.1/files/ngccli_linux.zip -O ngccli_linux.zip && unzip ngccli_linux.zip

The result was a 98G zip archive

$ du -hc cwa_dataset_v1.zip
98G     cwa_dataset_v1.zip

When I tried to unzip, I got the following error

$ unzip cwa_dataset_v1.zip
Archive:  cwa_dataset_v1.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of cwa_dataset_v1.zip or
        cwa_dataset_v1.zip.zip, and cannot find cwa_dataset_v1.zip.ZIP, period.

I did a quick search and the top results indicate one of two possibilities:

  1. The file is not actually a zip file
  2. The file has been corrupted

I tried downloading the dataset again (~ 1 hr) but same error.

Pinging you as we discussed. Please let me know if I can provide any more info @nbren12

Minimum reproducible example

No response

Relevant log output

No response

Environment details

No response

mnabian commented 2 months ago

Please use this link: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/modulus/resources/modulus_datasets_cwa

Please note that you must download the image only via NGC CLI. Direct/wget download won't work.

gideonite commented 2 months ago

Just ran the following command:

ngc  registry resource download-version "nvidia/modulus/modulus_datasets_cwa:v1"

Resulting in a 467.8 GB download which I am now waiting to complete. Note that this does not match what is documented as a 97.65 GB dataset when compressed. I guess what is available online is no longer compressed? Specifically, I see a status bar as follows:

⠼ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • 0.7/467.8 GiB • Remaining: 1:22:16 • 101.6 MB/s • Elapsed: 0:00:09 • Total: 1 - Completed: 0 - Failed: 0
nbren12 commented 2 months ago

Hi Gideon. All seems well. Let us know if the download fails. 97GB is probably inaccurate.

From: Gideon Dresdner @.> Date: Tuesday, May 7, 2024 at 8:05 AM To: NVIDIA/modulus @.> Cc: Noah Brenowitz @.>, Mention @.> Subject: Re: [NVIDIA/modulus] 🐛[BUG]: CorrDiff dataset is unreadable when downloaded via wget (Issue #431)

Just ran the following command:

ngc registry resource download-version "nvidia/modulus/modulus_datasets_cwa:v1"

Resulting in a 467.8 GB download which I am now waiting to complete. Note that this does not match what is documented as a 97.65 GB dataset when compressed. I guess what is available online is no longer compressed? Specifically, I see a status bar as follows:

⠼ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • 0.7/467.8 GiB • Remaining: 1:22:16 • 101.6 MB/s • Elapsed: 0:00:09 • Total: 1 - Completed: 0 - Failed: 0

— Reply to this email directly, view it on GitHubhttps://github.com/NVIDIA/modulus/issues/431#issuecomment-2098637215, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAKSRERSEQYVPT57WDYWL63ZBDURHAVCNFSM6AAAAABF64EDKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJYGYZTOMRRGU. You are receiving this because you were mentioned.Message ID: @.***>

mnabian commented 2 months ago

@gideonite you are on the right track. The compressed size is 467.8 GB, as you see in the status bar. UI shows the compressed size inaccurately, and that's a known bug with NGC catalog.