edwardlib / observations

Tools for loading standard data sets in machine learning
Other
202 stars 40 forks source link

Improved downloader with better progress/pause & resume/hash verification #16

Closed Arvinds-ds closed 7 years ago

Arvinds-ds commented 7 years ago

Addressing few of the issues raised by you to support a) Better progress update giving ETA/Speed b) Human readable size and time c) pause/resume downloads with resume=True option d) hash verification if hash is provided (hash_true option) e) Some fixes in other files (import six missing)

I have created a new file download_utils which has the logic for the above The utils maybe_download_and_extract has two new flags (resume=False & hash_true=None). I have enabled resume=True for LSUN as I faced problems when downloading this dataset and hence the PR

Arvinds-ds commented 7 years ago

Thanks. Will close this PR and reopen 2 PRs as follows

PR#1 - Compat changes/fix error in data files

  1. ptb/wiki103/wiki102 - change to io.open to prevent decode-utf8 error
  2. 'wb' change in util
  3. import six in cifar10 and cifar100

PR #2: New features 1.Merge code in download_utils to utils.py

  1. Do not expose download file
  2. Fix celeba.py

Is this ok?

dustinvtran commented 7 years ago

You can keep this PR to do either the compat changes/fix errors or the new features and make only one more PR. We squash and merge pull requests so the commit history is no bother.

Arvinds-ds commented 7 years ago

Kindly review this PR for compat changes.

dustinvtran commented 7 years ago

Can you add a commit which removes the download features? I can't review the PR for compat changes until then.

Arvinds-ds commented 7 years ago

Hi,

You should see only 2 commits for PR1 and PR2

dustinvtran commented 7 years ago

Github doesn't let you pick and choose which commits to merge. For example, I can review "PR1" but merge it only after you remove the "PR2" commit. You need to make a new branch and submit a second PR with only the "PR2" commit.

Arvinds-ds commented 7 years ago

OK. Got it. Pls review PR1..I will submit PR2 shorlty

dustinvtran commented 7 years ago

Looks good to me!