activitynet / ActivityNet

This repository is intended to host tools and demos for ActivityNet
MIT License
941 stars 330 forks source link

Why the downloaded mp4 file (kinetics400 data set) name is werid. #75

Closed LiuChaoXD closed 4 years ago

LiuChaoXD commented 4 years ago

I follow the script for downloading the kinetics400 data set. everything is ok. But the download.py has a parameters "--tmp-dir" , and the default "/tmp/kinetics400"

I don't want to occupy my home space, so I changed it to "./tmp/kinetics400"

When I go to ./tmp/kinetics400/ , I found the name of .mp4 file is very weird that cannot match the .csv file. The file I downloaded as the follows,

Screen Shot 2020-07-29 at 11 19 25 AM
LiuChaoXD commented 4 years ago

I think the error is caused by ffmpeg

For example, the mp4 file is downloaded by YouTube-dl, and then ffmpeg is used to trim mp4 file and save to the target dir.

I set the num_jobs == 1, and print some key information.

First , YouTube-dl successfully download the video file "./tmp/kinetics/88aa675f-d246-45a8-b417-21806ef6e72c.mp4" (where I found it in the ./tmp/kinetics/ dir) Second, download.py script uses the command ffmpeg -i "./tmp/kinetics/88aa675f-d246-45a8-b417-21806ef6e72c.%(ext)s" -ss 136.0 -t 10.0 -c:v libx264 -c:a copy -threads 1 -loglevel panic "./k400_val/playing saxophone/--7VUM9MKg4_000136_000146.mp4" to save 88aa67... file into mp4 However, in the ./k400_val/playing saxophone/, there is nothing..

escorciav commented 4 years ago

Sorry about that.

Regarding the naming convention, @cabaf might know the reason. Perhaps, using multiple machines writing on single shared storage?

If you don't like the convention, simply update this part as you prefer. BTW, Fabian did write a very modular code compared to the bunch of hacks and mess that academics post in Github 😊.

Regarding debugging, you need to refer to the commands used (youtube-dl) to understand the meaning of %%(ext)s. Note that at this point tmp_filename is updated, my using glob (think about it as tab-completion).

Good luck! Victor

P.D. We aren't the dataset maintainers. We simply helped out the datasets maintainers as part of the ActivityNet challenge.

LiuChaoXD commented 4 years ago

Sorry about that.

Regarding the naming convention, @cabaf might know the reason. Perhaps, using multiple machines writing on single shared storage?

If you don't like the convention, simply update this part as you prefer. BTW, Fabian did write a very modular code compared to the bunch of hacks and mess that academics post in Github 😊.

Regarding debugging, you need to refer to the commands used (youtube-dl) to understand the meaning of %%(ext)s. Note that at this point tmp_filename is updated, my using glob (think about it as tab-completion).

Good luck! Victor

P.D. We aren't the dataset maintainers. We simply helped out the datasets maintainers as part of the ActivityNet challenge.

Thanks a lot. Follow your suggestions, I will solve it. If the errors I solved, I will post the solution here . Thank you.

LiuChaoXD commented 4 years ago

Sorry about that.

Regarding the naming convention, @cabaf might know the reason. Perhaps, using multiple machines writing on single shared storage?

If you don't like the convention, simply update this part as you prefer. BTW, Fabian did write a very modular code compared to the bunch of hacks and mess that academics post in Github 😊.

Regarding debugging, you need to refer to the commands used (youtube-dl) to understand the meaning of %%(ext)s. Note that at this point tmp_filename is updated, my using glob (think about it as tab-completion).

Good luck! Victor

P.D. We aren't the dataset maintainers. We simply helped out the datasets maintainers as part of the ActivityNet challenge.

Thank u, I have solved this problem. the reason for this problem: 1. "--tmp-dir" in the download.py 218L. The default path is /tmp/kinetics/ which will cause occupy the storage of "/" directory

  1. To address the 1 problem, I change --tmp_dir="./tmp/kinetics/"
  2. "tmp_filename = glob.glob('%s*' % tmp_filename.split('.')[0])[0]" this command is ok when --tmp_dir==/tmp/kinetics. Once I changed it into --tmp_dir==./tmp/kinetics/, the tmp_filename will be None. Because the first position is "." .

To address it: I change the --tmp_dir=“Absolute path”. My suggestion is the --tmp_dir should not contain any "."

Thank u very much. And I will close this issue.