floydwch / kaggle-cli

(Deprecated, use https://github.com/Kaggle/kaggle-api instead) An unofficial Kaggle command line tool.
MIT License
674 stars 91 forks source link

why is the downloading/uploading so slow #16

Closed kirk86 closed 7 years ago

kirk86 commented 7 years ago

Compared to the downloading and uploading directly from the kaggle interface website, when we use kg seems to be quite slow, why is that?

invokerk commented 7 years ago

I would like to work on this. @kirk86 How did you compare the download/upload ? and which file were you downloading during the test ?

kirk86 commented 7 years ago

@invokerk hi, to be honest with you at the moment I don't recall exactly which file I've used but I did sth along the following lines. Just pick a kaggle competition and download the data directly from their website, then do the same using kaggle-cli. Also, do the same for the submission file, i.e. the upload. This will give you pretty much an idea of how long does it take on average for an upload/download.

floydwch commented 7 years ago

It could delay at login, fetch download/submit page, download/submit action. The delay times depend on Kaggle's server. It could be mitigated by cache the login and page fetching, reducing requests.

kirk86 commented 7 years ago

@floydwch thanks a lot. I just have another question. How can you set up kaggle-cli if you're behind a proxy? Thanks.

floydwch commented 7 years ago

Kaggle-cli employs MechanicalSoup as the browser, and MechanicalSoup employs Requests to handle the HTTP.

According to http://docs.python-requests.org/en/master/user/advanced/#proxies , it seems to have the chance to support proxy as a argument. However, Requests supports setting proxy by environment variable, in the meanwhile, you can just try this way.

kirk86 commented 7 years ago

@floydwch

in the meanwhile, you can just try this way.

yeah I've noticed that. In the meantime it would be really nice to have it as an option, don't you think?

floydwch commented 7 years ago

@kirk86 I found time to investigate this feature request, and that's my conclusion. FYI.

floydwch commented 7 years ago

Since we have already implemented browser caching (see https://github.com/floydwch/kaggle-cli/commit/1176fd7cd396de9508d9e1a59ea086a291147b19), I'm going to close this issue, if there is still some weird deley, please feel free to reopen issue.

smcinerney commented 6 years ago

How many MB/s should we see when uploading? There's no progress bar or ETA, so you can't tell if uploading submission crashed.

Also, some tips on making uploading submissions faster: