cvdfoundation / kinetics-dataset

716 stars 92 forks source link

part_120.tar.gz is not a tar.gz file but a tar file #21

Closed liyz15 closed 2 years ago

GeeeekExplorer commented 2 years ago

And the directory structure is different from others:

$ tar -tvf part_120.tar.gz
-rwxr-xr-x soldanm/soldanm 2795874 2020-11-30 21:46 home/soldanm/Documents/projects/kinetics/v1-0/train/UYF8sCDON5I_000002_000012.mp4
-rwxr-xr-x soldanm/soldanm  285527 2020-11-30 20:45 home/soldanm/Documents/projects/kinetics/v1-0/train/UYGGtu1ON70_000037_000047.mp4
-rwxr-xr-x soldanm/soldanm 2383685 2020-11-30 21:40 home/soldanm/Documents/projects/kinetics/v1-0/train/UYGZ9WEKycc_000019_000029.mp4
...
$ tar -tvf part_10.tar.gz
drwxrwxr-x soldanm/soldanm   0 2021-04-15 01:30 ./
-rwxr-xr-x soldanm/soldanm 438026 2020-11-30 20:51 ./267SBn5BJqw_000132_000142.mp4
-rwxr-xr-x soldanm/soldanm 597672 2020-11-30 21:04 ./249y2I777K0_000384_000394.mp4
-rwxr-xr-x soldanm/soldanm 342147 2020-11-30 20:47 ./1scesQQaA9w_000000_000010.mp4
...
lnschroeder commented 2 years ago

As a workaround, I extracted part_120tar.gz manually after extract.sh finished.

tar -xf k400_targz/train/part_120.tar.gz --strip-components 7 -C k400/train
pmeier commented 2 years ago

torchvision also got the first report of this, since it breaks the automatic download. Would it be possible to re-package and re-upload the archive?

mv part_120.tar.gz part_120.tar.gz.old
tar -xvf part_120.tar.gz.old --strip-components 7
tar -czf part_120.tar.gz --remove-files *.mp4
bjuncek commented 2 years ago

cc @kinetics-cvdf

kinetics-cvdf commented 2 years ago

Hi, sorry i've been swamped, I'll fix this today.

kinetics-cvdf commented 2 years ago

I think i fixed it, if anyone can verify, i'll close this.

kinetics-cvdf commented 2 years ago

(btw, sorry the "fix today" was widly optimistic)

pmeier commented 2 years ago

Either the change has not been propagated yet or it didn't work:

$ wget https://s3.amazonaws.com/kinetics/400/train/part_120.tar.gz
[...]
$ file part_120.tar.gz
part_120.tar.gz: POSIX tar archive (GNU)
$ tar -tf part_120.tar.gz | head -n3
home/soldanm/Documents/projects/kinetics/v1-0/train/UYF8sCDON5I_000002_000012.mp4
home/soldanm/Documents/projects/kinetics/v1-0/train/UYGGtu1ON70_000037_000047.mp4
home/soldanm/Documents/projects/kinetics/v1-0/train/UYGZ9WEKycc_000019_000029.mp4
$ md5sum part_120.tar.gz
5ea5fc87bb5dbdacad8b8d0930b22ace  part_120.tar.gz
kinetics-cvdf commented 2 years ago

I tried running the same exact command as you and see the following: file part_120.tar.gz part_120.tar.gz: gzip compressed data tar tf part_120.tar.gz | head -n3 UYF8sCDON5I_000002_000012.mp4 UYGGtu1ON70_000037_000047.mp4 UYGZ9WEKycc_000019_000029.mp4

I assume it was just the propagation. Thanks! I'm closing this now.

pmeier commented 2 years ago

Change is now also available for me. Thanks @kinetics-cvdf!