Add the ability to fetch tiles in parallel

tcihak-fqa commented 1 year ago

Fetching the web tiles can take a long time when there are many to download. Would it be possible to do multiple tile requests in parallel using the multiprocessing module?

darribas commented 1 year ago

Thanks very much for the suggestion. I think that is technically possible and would amount to parallelising this for loop:

https://github.com/geopandas/contextily/blob/d821ac782764a0bad436b61eef9f83473fc3df6c/contextily/tile.py#L219-L224

However, I'm not sure parallelisation would speed up things. It would if the bottleneck was in processing the tiles, but my hunch is that most of the time is spent on download/latency issues. If that's the case, wouldn't parallelisation not solve it (if anything add more downloading load for the same bandwidth)?

tcihak-fqa commented 1 year ago

Thanks for the response! So I'm using contextily to generate static maps from a web api that has high bandwidth. I agree that most of the time is spent on I/O latency.

JacobJeppesen commented 1 year ago

I have actually been considering making a pull request for this exact functionality. I've been playing around with it in another project, and for large images it can make a rather big difference. The optimal number of parallel downloads differ quite a bit from endpoint to endpoint, but 8-16 is normally a good range. The code below can be used as a drop-in replacement for the for loop:

def bounds2img(
        w, s, e, n, zoom="auto", source=None, ll=False, wait=0, max_retries=2, num_parallel_tile_downloads=16
):
.
.
.
    # download and merge tiles
    # tiles = []
    # arrays = []
    # for t in mt.tiles(w, s, e, n, [zoom]):
    #     x, y, z = t.x, t.y, t.z
    #     tile_url = provider.build_url(x=x, y=y, z=z)
    #     image = _fetch_tile(tile_url, wait, max_retries)
    #     tiles.append(t)
    #     arrays.append(image)
    from joblib import Parallel, delayed  # This should go to the top of the file
    tiles = list(mt.tiles(w, s, e, n, [zoom]))
    tile_urls = [provider.build_url(x=tile.x, y=tile.y, z=tile.z) for tile in tiles]
    max_num_parallel_tile_downloads = 32
    # Note that num_parallel_tile_downloads has been added as an argument to the function
    if num_parallel_tile_downloads < 1 or num_parallel_tile_downloads > max_num_parallel_tile_downloads:
        raise ValueError(
            f"num_parallel_tile_downloads must be between 1 and {max_num_parallel_tile_downloads}"
        )
    arrays = \
        Parallel(n_jobs=num_parallel_tile_downloads, prefer="threads")(
            delayed(_fetch_tile)(tile_url, wait, max_retries) for tile_url in tile_urls)

    merged, extent = _merge_tiles(tiles, arrays)
.
.
.

I just tested it in the intro_guide.ipynb notebook by downloading an extended version of the ghent image in the Coordinate-based searches section, with the following code:

west, south, east, north = (
    3.616218566894531,
    50.98912458110244,
    5.8483047485351562,
    54.13994019806845
             )
import time
start_time = time.time()
ghent_img, ghent_ext = cx.bounds2img(west, 
                                     south, 
                                     east, 
                                     north, 
                                     ll=True, 
                                     zoom=11,
                                     source=cx.providers.Stamen.Toner,
                                     num_parallel_tile_downloads=8
                                    )
print(f"Download time: {time.time() - start_time}")
ghent_img.shape

Note that I had to out-comment the @memory.cache decorator for the _fetch_tile function during the test.

The shape of the image was (7680, 3584, 4), and the download times were: num_parallel_tile_downloads=1: 69.49s num_parallel_tile_downloads=2: 36.11s num_parallel_tile_downloads=4: 18.32s num_parallel_tile_downloads=8: 9.55s num_parallel_tile_downloads=16: 5.02s num_parallel_tile_downloads=32: 2.92s

I don't think there should be any downside to always do it in parallel (the overhead should be minimal), as long as the number of parallel downloads don't bomb the endpoint. A default value of 16 is a good starting point in my experience, with the different endpoints I've tested. That normally gives almost linear improvements. Above that, it differs quite a bit.

That ended up being quite a bit of text. Hopefully it's useful :wink:

Let me know if you would like a pull request with the implementation :+1:

JacobJeppesen commented 1 year ago

Just made a couple more tests with smaller images:

At zoom=9 the shape was (2048, 1024, 4) and download times were: num_parallel_tile_downloads=1: 5.38s num_parallel_tile_downloads=16: 0.50s

At zoom=7 the shape was (512, 512, 4) and download times were: num_parallel_tile_downloads=1: 0.69s num_parallel_tile_downloads=16: 0.17s (there's only 4 tiles so in practice we only do 4 parallel downloads)

I also tested the three zoom levels with the current for loop implementation. The download times were pretty much exactly the same as with num_parallel_tile_downloads=1.

tcihak-fqa commented 1 year ago

This solution looks very nice Jacob! My vote would be to create a PR and get it into a release at some point.

JacobJeppesen commented 1 year ago

Thanks @tcihak-fqa :smiley:

I've just added a pull request with the changes (https://github.com/geopandas/contextily/pull/217).

If you'd like to use it now, you can install it with:

pip uninstall -y contextily
pip install git+https://github.com/JacobJeppesen/contextily@parallel_tile_downloads

tcihak-fqa commented 1 year ago

Thanks Jacob. I monkey patched your original solution and it seems to be working well. I haven't encountered any memory issues but the number of api requests has been light so far.

martinfleis commented 1 year ago

Closed by #217

geopandas / contextily

Add the ability to fetch tiles in parallel #215