AI4Bharat / NPTEL2020-Indian-English-Speech-Dataset

NPTEL2020: Speech2Text dataset for Indian-English Accent
72 stars 20 forks source link

Unable to unzip the downloaded data #8

Closed snaaz21 closed 3 years ago

snaaz21 commented 3 years ago

Hi,

I downloaded some parts (50 gb each) one by one of the dataset and when I am trying to uncompressed it after concatenating 2 parts into train.tar.gz, it throws error --> "not in gzip format"

I tried multiple alternatives but couldn't uncompressed it.

Is that something wrong with the downloading?

Suggest some method to uncompressed it?

Prem-kumar27 commented 3 years ago

Hi

Download all the parts and then concatenate it as in here and then extract the concatenated tar.gz archieve file.

snaaz21 commented 3 years ago

Hi

Download all the parts and then concatenate it as in here and then extract the concatenated tar.gz archieve file.

yes, I concatenated 2 parts using the same command but its failing while extracting the tar.gz file

GokulNC commented 3 years ago

Which 2 parts?

snaaz21 commented 3 years ago

Which 2 parts?

partaa and partab

GokulNC commented 3 years ago

OK. You need to download all the parts, and then only merge them all into a single train.tar.gz and extract it.

But if you cannot afford downloading all parts and want only a subset of the whole dataset, unfortunately the only way is to try repairing the merged initial N parts. Googling something like "Repair corrupt tar.gz" should help, but we cannot guarantee it will work.

snaaz21 commented 3 years ago

OK. You need to download all the parts, and then only merge them all into a single train.tar.gz and extract it.

But if you cannot afford downloading all parts and want only a subset of the whole dataset, unfortunately the only way is to try repairing the merged initial N parts. Googling something like "Repair corrupt tar.gz" should help, but we cannot guarantee it will work.

ok, thank you for the response

GokulNC commented 3 years ago

There is an option to download from torrent now. You can use that to download only the files that you require. Don't forget to seed :)

tuanphan09 commented 1 year ago

@GokulNC Could you provide checksum for these file nptel-train.tar.gz.part*. Thanks a lot.

tuanphan09 commented 1 year ago

never mind, i found in in zenodo link, for anyone doesn't know

nptel-train.tar.gz.partaa 1190c56eb1a006602546a9dd5a71d393
nptel-train.tar.gz.partab dce6294f7b9c60df16d0459a6a58c634
nptel-train.tar.gz.partac 00399c57ec104449d9157bc41b0645d1
nptel-train.tar.gz.partad 257d8e91365a01c90b67068fc3bd12b5
nptel-train.tar.gz.partae b4e203cb1c49b2bc9993846ac9fb6f9e
nptel-train.tar.gz.partaf 2792bff1d7082b633961270f4071430b
nptel-train.tar.gz.partag f2f939ceb9276c8bf9befb8bfc5e2e59
nptel-train.tar.gz.partah dadbebaf0165e3a2b496f595d024da98
nptel-train.tar.gz.partai f6bf790c6bf632748fd294283c7d7a47
nptel-train.tar.gz.partaj 91f1a92178bff04edc282ce2ed02d3ea
nptel-train.tar.gz.partak f10f3f9854accd5ed4745be4618ebcf4
nptel-train.tar.gz.partal 371c1aabc8a357d44df6dd97c2b580f0
nptel-train.tar.gz.partam 0f05df03a01a737a3eb815f0a4ecf82d
nptel-train.tar.gz.partan d6cec63569a0a762622805c7b21d82ed
nptel-train.tar.gz.partao 6d19fddb304e974524d96773b4cac3e7
nptel-train.tar.gz.partap 3ec62c5d210dea9e7e41000a5af3ec16
nptel-train.tar.gz.partaq 170ee0c6a0ce85179f1e4c1685844261
nptel-train.tar.gz.partar a30ce4de4dbebe7d1fa33d06686014a1
nptel-train.tar.gz.partas ed17a5e3f25c195601901bdc5e109b27
nptel-train.tar.gz.partat de3646f2ab8e27a83232318cf556ab3f