Open YouJiacheng opened 2 years ago
The CLI checks the filesize already for filtering purposes and it additionally checks the corresponding version. I believe why this was done this way, is because as you have discovered, getting a hash of a video is non-trivial on S3.
You can see the corresponding code here: https://github.com/facebookresearch/Ego4d/blob/main/ego4d/cli/download.py#L215-L236
I know the CLI checks the filesize and have read the corresponding code. But my situation is rather awkward: My compute-server is in China, and cannot download data from AWS. So I make use of a data-server to download data from AWS. However I cannot compute hash on my data-server(due to some reasons). As a result I cannot verify the transfer between my data-server and compute-server. Ego4D team can compute hash of videos locally, and that can be helpful. Moreover, the integrity can be promised by upload to S3 with hash checksum.
Can you confirm that you can now download directly in China via the CLI?
https://discuss.ego4d-data.org/t/cli-updates-improved-china-access/128
Yes! It is fast ~and smooth~ in general. I download the annotations(2.5G). Peak speed is 800Mb/s (100MB/s). It takes 1m50s to download 99%, but the last 1% takes 5m20s. Still satisfactory!
Can you tell me the details about the download in China?Especially the settings of the two parameters of aws configure:'defaut regoin name' and 'defaut output format'.
I want to check the integrity of downloaded data, and I tried:
However, since videos are uploaded in multipart,
e_tag
is not the md5 of the video, and the calculation is non-trivial. Thus I can merely usecontent_length
to check the integrity, which is not reliable.