emissions-api / sentinel5dl

Sentinel-5(P) Downloader
https://sentinel5dl.emissions-api.org
MIT License
12 stars 8 forks source link

Download with threads #59

Closed baevpetr closed 2 years ago

baevpetr commented 4 years ago

Sending a bit late PR... 1) Implemented call_with_retries which can be used somewhere else (maybe moved to utils.py in the future). Tried not to bind to existing code. Now you can continue download files because the server allows you to do this with usage of curl.setopt(pycurl.RESUME_FROM, os.path.getsize(filename)) *Using recursion in "try with retries" block is not recommended. 2) Number of workers:

parser.add_argument(
        '--worker',
        type=int,
        default=min(32, os.cpu_count() + 4),
        help='Number of parallel downloads',
    )

3) Added default folder for downloads:

parser.add_argument(
        '--download_dir',
        default='data/',
        metavar='download-dir',
        help='Download directory'
    )

4) Rewrite 1 test with unittest.mock.patch, remove 1 since it check some obvious logic. 5) Downloaded 34 files from example:

result = search(
        polygon='POLYGON((7.8 49.3,13.4 49.3,13.4 52.8,7.8 52.8,7.8 49.3))',
        begin_ts='2019-09-01T00:00:00.000Z',
        end_ts='2019-09-17T23:59:59.999Z',
        product='L2__CO____',
        processing_level='L2',
        processing_mode='Offline')

successfully. 6) * Could not implement handling corrupted file during download because total file size is unknown. Maybe some extra errors may occur...

lkiesow commented 4 years ago

Hi @baevpetr, thanks for the patch. As you might have already noticed, this has merge conflicts with the current state of master, mostly due to the fact that parts of this are already implemented like parallel downloads using workers (#57) or cleaning up half-finished files (#58) so that this patch needs a rebase.

Also a practical note from someone doing a lot of code reviews, it helps a lot if you create one patch/pull request per feature and not put everything into one pull requests, plus, you should avoid unnecessary and unrelated changes (e.g. code formatting). That makes reviewing and understanding what a particular change is for so much easier and makes reviewers happy (especially since that's me in this case ;-)

Finally, on removing the executable tests, they are mostly testing obvious code, but they are actually checking imports and argparse set-up, both of which I have seen break, especially due to hastily resolved merge conflicts. So I would keep them unless there is a good reason for dropping them?

lkiesow commented 4 years ago

I forgot to mention that there is already a default download directory. By default, it is set to the current directory ..