SilvioGiancola / TrackingNet-devkit

Development kit for TrackingNet
https://tracking-net.org/
170 stars 31 forks source link

can't download the dataset #2

Open bagusyusuf opened 5 years ago

bagusyusuf commented 5 years ago

hi author,

my name is bagus, regarding your disclaimer, "In case an error such as Permission denied: https://drive.google.com/uc?id=, Maybe you need to change permission over 'Anyone with the link'? occurs, please check your internet connection and run again the script."

I have to try to re-download this a few time and still report the same error, then I check my internet connection, and it seems ok, and I try to access your link on my browser and the google drive page report that too many users access your file, and there is a user limit, is this normal ??

kindly need your help, I can't download all the dataset, the only downloaded files is the annotation.

best regards, bagus

Jiangfeng-Xiong commented 5 years ago

the same issue. I use chrome to download the url, i got the following message

img

SilvioGiancola commented 5 years ago

Dear @BagusYusuf @Jiangfeng-Xiong ,

Thank you for raising this issue.

The data are currently hosted on Google Drive and it seems that there is a download limit of 10TB per day. Our script is smart enough to not download twice the data, but it is obviously an issue since the complete dataset weight >1TB.

It actually works for now, you can go ahead and try to download it again. We are currently investigating for a better solution to share the data and will update you as soon as possible.

Best,

shenh10 commented 5 years ago

Any update? The google drive is not accessible in China Mainland. Would you consider Baidu Drive? https://login.bce.baidu.com/?lang=en

shuida commented 5 years ago

@shenh10 Have you downloaded the whole dataset and can share with me? I can't also download it in Beijing.

ANULLL commented 5 years ago

I have run into the same problem, who have the dataset on baidu Drive? I have tried many times, but not success.

SilvioGiancola commented 5 years ago

We are currently not supporting Baidu Drive. If anyone has a copy of the dataset on Baidu Drive, we would be more than happy to support him for its contribution, and update the README with instruction on how to download TrackingNet in China Mainland.

shenh10 commented 5 years ago

@shuida Nope... Failed

RogerYu123 commented 4 years ago

anybody help? upload the dataset to Baidu yunpan.

rambleramble commented 4 years ago

wondering if people are still able to download data using this devkit? I am consistently hitting this "Maybe you need to change permission over..." for last 10 days. I was only able to download maybe <5 zips per day

Any solution or perhaps 3rd party download? Thanks in advance.

ghost commented 4 years ago

Guess Google is limiting downloads per client, i was downloading annotations and got the same error at 41%, tried again by using VPN to connect to my office network and it worked for another ~40%. Maybe you can add this tip to your readme. @SilvioGiancola. Also tried renewing my WAN IP from my router but that didnt work.

qdLMF commented 4 years ago

I found a solution, it works fine on my machine, though i dont' really know why it works. It's not about Google's limits on download requests. Apparently you should send your google account login information along with your https requests:

HEADERS = {"User-Agent": "some string"} 
GOOGLE_LOGIN_COOKIE_STR = "some string"

# This function transfers a cookie string to a requests.cookies.RequestsCookieJar object.
def cookie_str2jar(cookie_str):
    cookie_dict = {}
    for item in cookie_str.split(';'):
        item = item.lstrip().rstrip()
        idx_eq = item.find('=')
        key = item[:idx_eq]
        value = item[idx_eq:].lstrip('=')
        cookie_dict[key] = value
    cookie_jar = requests.cookies.merge_cookies(requests.cookies.RequestsCookieJar(), cookie_dict)
    return cookie_jar

GOOGLE_LOGIN_COOKIE_JAR = cookie_str2jar(GOOGLE_LOGIN_COOKIE_STR)

You can find the above two strings in your web browser when you go to the google main page and log into your google account. Python's requests package has its own string for "User-Agent", I'm not sure if this string works, I used the one copied from the web browser, and it looks like this:

HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE"}

Then, you have to change the first a few lines of download() function in downloader.py:

def download(url, output, quiet):
    url_origin = url
    sess = requests.session()
    retry = Retry(total=100)    # Better have a big number of retries.
    adapter = HTTPAdapter(max_retries=retry)
    sess.mount('http://', adapter)
    sess.mount('https://', adapter)
    sess.keep_alive = False    # I'm not sure if this line is neccessary, it works fine for me.

    is_gdrive = is_google_drive_url(url)

    count = 0
    while True:
        count += 1
        # Must have these two lines of if codes below.
        # Apparently you have to send multiple requests for one zip file.
        # On my machine, it takes at most 3 requests to get the proper download response.
        # It works fine on my machine, but I really don't know why.
        if count <= 10: 
            res = sess.get(url, headers=HEADERS, cookies=GOOGLE_LOGIN_COOKIE_JAR, stream=True)

        # The rest part of this function is all the same as the original version.
        if 'Content-Disposition' in res.headers:
            # This is the file
            break
        if not is_gdrive:
            break
        ......

Also, I disabled IPv6 on my VPN host, but i'm not sure if it matters or not. Also, for you guys in China, in my experience, VPN location has a big influence on download speed. I used a vultr host located in New York, equipped with V2Ray and Google BBR. It downloaded the dataset with nearly full speed of my internet service.

SilvioGiancola commented 4 years ago

To anyone who still has issue downloading TrackingNet, we are currently trying to find more reliable solutions. For now, we have created back up links to download full chunks of training (and the testing chunk). It's still hosted on Google Drive, but will be easier to spread around the community using alternative sharing platforms (e.g. Baidu, Dropbox, good old HDD,...).

Here are two back up links: [link1] [link2]

Now, it appears that Google Drive is limiting the download if you are not signed in with you gmail account. If you have any issue downloading it, please make sure you are signed in google drive with you gmail account. We will track the situation in the next days.

1e100 commented 4 years ago

@SilvioGiancola have you considered http://academictorrents.com/?

SilvioGiancola commented 4 years ago

@1e100 academictorrents only builds a tracker for academic-related torrents, it does not host data, nor seed torrents. And torrenting has its limitation too:

The real question here is: Would you consider seeding TrackingNet? I don't mind creating a torrent (anyone can push TrackingNet as a torrent) but that could have an solution if and only if everyone is seeding proportionally.

1e100 commented 4 years ago

I hear you @SilvioGiancola, but right now your dataset is basically impossible to download. I tried to download it right after midnight PST, and Google already says “bandwidth exceeded”. There’s got to be a solution of some sort for this.

SilvioGiancola commented 4 years ago

I am pushing a version of TrackingNet on academictorrent. Please seed as much as possible as my upload rate is a fraction of what Google Drive can provide. Any feedback is appreciated.

1e100 commented 4 years ago

Thank you, @SilvioGiancola. On my end I'll download and seed. I don't know how long I'll be able to keep it up, but hopefully enough for the interested parties to download and propagate further. I'll seed for at least a few days. Hopefully others will join in and carry the torch in a more permanent fashion.

1e100 commented 4 years ago

@SilvioGiancola so far, no download progress. Are you sure you're forwarding the right port range? For e.g. aria2c it's 6881 - 6999, which is more ports than some other BitTorrent implementations.

SilvioGiancola commented 4 years ago

@1e100 I'm using Transmission Qt GUI, it's still verifying the local data. 180GB/1.14TB done in the last 30min, I guess it will still take ~4h.

1e100 commented 4 years ago

OK, I'll report back in 4-5 hours again.

1e100 commented 4 years ago

Still nothing.

SilvioGiancola commented 4 years ago

It will require me more time to figure it out, I'll keep you posted. In the meantime, can you reach on slack for further debugging?

SilvioGiancola commented 4 years ago

We are currently experimenting a bittorrent solution to share TrackingNet among the tracking community. The torrent is available on https://academictorrents.com/details/1faf1b53cc0099d2206f02be42b5688952c3c6b3.

It may be very slow at the beginning, but it will improve once more people will require the a copy. Here are some guidelines:

HAoYifei996 commented 4 years ago

Hi there, Currently I am also experiencing downloading problem here, as I cannot get access to google drive in mainland China. I wonder if I use the backup link(like ownCloud updated in May), does that mean I need to manually download the datasets insted of using download_TrackingNet.py? Or I just need to change the url in that file? Also, I have tried to manually download TRAIN0.zip from ownCloud and still I am experiencing a very low downloading speed.

SilvioGiancola commented 4 years ago

Yes, you should download the dataset manually if you are using any alternative solution (ownCloud, torrent, GDrive backup).

May I ask you what speed are you reaching for the ownCloud solution? If it is not fast enough for you, I would then recommend you the torrent collection on academictorrents: https://academictorrents.com/collection/trackingnet.

HAoYifei996 commented 4 years ago

@SilvioGiancola Thank you for your reply. I was reaching a speed around only 10k/s, which is impossible for me to download the whole dataset :( I don't know if this was caused by my location, my internet looks good to me. I will try the torrent solution see if it works. Thanks again for your help!

SilvioGiancola commented 4 years ago

I guess the problem originates from your location. Collaborators in Europe were able to have 15MB/sec download speed. Are you using your university connection? The ownCloud storage is part of a Globus network (https://www.globus.org/) that optimize data transfer across universities and research institutions in the world.

Alternatively, you should try the academictorrent collection https://academictorrents.com/collection/trackingnet. Try the 13 torrents for the 12 train chunks and the test chunk, parallelizing the download across multiple torrents would provides you a faster bandwidth.