Kaggle / kaggle-api

Official Kaggle API
Apache License 2.0
6.01k stars 1.06k forks source link

"Bad zip file, please report" when downloading dataset using Kaggle API #516

Open hlysine opened 7 months ago

hlysine commented 7 months ago

I was downloading a dataset automatically using the Kaggle API in a Streamlit application. When the download has finished, Kaggle API throws an error and fails to unzip the data.

The dataset is https://www.kaggle.com/datasets/nih-chest-xrays/sample/data I was using kaggle-1.5.16 in a Linux container on Hugging Face.

The only relevant log I can find is this:

Downloading sample.zip to /home/user/app
Downloading sample.zip to /home/user/app
... resuming from 655360 bytes (4505704260 bytes left) ...
Downloading sample.zip to /home/user/app
... resuming from 2097152 bytes (4504262468 bytes left) ...
  0%|          | 0.00/4.20G [00:00<?, ?B/s]
  0%|          | 2.00M/4.20G [00:00<05:14, 14.3MB/s]
  0%|          | 640k/4.20G [00:00<?, ?B/s]
  0%|          | 2.00M/4.20G [00:00<?, ?B/s]
  0%|          | 4.00M/4.20G [00:00<05:56, 12.6MB/s]
  0%|          | 2.62M/4.20G [00:00<04:09, 18.0MB/s]
  0%|          | 4.00M/4.20G [00:00<06:22, 11.8MB/s]
  0%|          | 4.62M/4.20G [00:00<04:59, 15.0MB/s]
  0%|          | 7.00M/4.20G [00:00<04:42, 15.9MB/s]

...

('Bad zip file, please report on www.github.com/kaggle/kaggle-api', BadZipFile('File is not a zip file'))

100%|██████████| 4.20G/4.20G [00:24<00:00, 181MB/s]
('Bad zip file, please report on www.github.com/kaggle/kaggle-api', BadZipFile('Bad magic number for central directory'))

 98%|█████████▊| 4.13G/4.20G [00:24<00:00, 201MB/s]
 99%|█████████▉| 4.16G/4.20G [00:24<00:00, 87.7MB/s]
100%|█████████▉| 4.19G/4.20G [00:25<00:00, 116MB/s] 
100%|██████████| 4.20G/4.20G [00:25<00:00, 176MB/s]
('Bad zip file, please report on www.github.com/kaggle/kaggle-api', BadZipFile('Bad magic number for file header'))