ajnisbet / opentopodata

Open alternative to the Google Elevation API!
https://www.opentopodata.org
MIT License
314 stars 68 forks source link

Problem with ned10m dataset #8

Closed hugheslavigne closed 4 years ago

hugheslavigne commented 4 years ago

I've tried to set up a server to get data out of the ned10m dataset.

When I try to run it, I get this log :

uWSGI is running in multiple interpreter mode spawned uWSGI master process (pid: 12) spawned uWSGI worker 1 (pid: 37, cores: 1) spawned uWSGI worker 2 (pid: 38, cores: 1) 2020-08-05 15:11:34,862 INFO success: nginx entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2020-08-05 15:11:34,862 INFO success: memcached entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2020-08-05 15:11:34,862 INFO success: warm_cache entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2020-08-05 15:11:34,862 INFO success: uwsgi entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) ERROR:root:Invalid config: Unknown dataset type for 'ned10m'. ERROR:root:Unable to warm cache. This probably means Open Topo Data isn't working. 2020-08-05 15:11:34,940 INFO exited: warm_cache (exit status 1; not expected)

It seems that my dataset is not properly configured. My config.yaml is exactly the same as the one in the doc for the ned10m dataset. I've only downloaded a part of the ned10m dataset from USGS. Here is my folder structure for the dataset : data/ned10m: USGS_13_n05e162.tif USGS_13_n06e152.tif USGS_13_n09e138.tif USGS_13_n17w066.tif USGS_13_n18w156.tif USGS_13_n21w158.tif USGS_13_n24w081.tif USGS_13_n25w098.tif USGS_13_n26w099.tif USGS_13_n05e163.tif USGS_13_n06e158.tif USGS_13_n13e144.tif USGS_13_n17w067.tif USGS_13_n19w155.tif USGS_13_n21w159.tif USGS_13_n24w082.tif USGS_13_n25w099.tif USGS_13_n26w100.tif USGS_13_n06e134.tif USGS_13_n07e152.tif USGS_13_n14e145.tif USGS_13_n17w068.tif USGS_13_n19w157.tif USGS_13_n21w160.tif USGS_13_n24w083.tif USGS_13_n26w081.tif USGS_13_n27w081.tif USGS_13_n06e151.tif USGS_13_n08e134.tif USGS_13_n17w065.tif USGS_13_n18w066.tif USGS_13_n20w156.tif USGS_13_n21w161.tif USGS_13_n25w082.tif USGS_13_n26w083.tif

ajnisbet commented 4 years ago

Hey thanks for raising this. The issue is that the NED files from USGS have the top-left corner in the filename, but most elevation datasets use the lower-left corner, and Open Topo Data uses the lower-left corner too for quickly finding the correct file for a given location.

So USGS_13_n05e162.tif should be renamed USGS_13_n04e162.tif (or just n04e162.tif to make it clear the files have been renamed). There's a python script in the documentation for NED that does the renaming:

from glob import glob
import os
import re

old_pattern = './data/ned10m/USGS_13_*.tif'
old_paths = list(glob(old_pattern))
print('Found {} files'.format(len(old_paths)))

for old_path in old_paths:
    folder = os.path.dirname(old_path)
    old_filename = os.path.basename(old_path)

    # Extract northing.
    res = re.search(r'([ns]\d\d)', old_filename)
    old_northing = res.groups()[0]

    # Fix the NS 
    n_or_s = old_northing[0]
    ns_value = int(old_northing[1:3])
    if old_northing[:3] == 'n00':
        new_northing = 's01' + old_northing[3:]
    elif n_or_s == 'n':
        new_northing = 'n' + str(ns_value - 1).zfill(2) + old_northing[3:]
    elif n_or_s == 's':
        new_northing = 's' + str(ns_value + 1).zfill(2) + old_northing[3:]
    new_filename = old_filename.replace(old_northing, new_northing)
    assert new_northing in new_filename

    # Prevent new filename from overwriting old tiles.
    parts = new_filename.split('.')
    parts[0] = parts[0] + '_renamed'
    new_filename = '.'.join(parts)

    # Rename in place.
    new_path = os.path.join(folder, new_filename)
    os.rename(old_path, new_path)

I made a release to improve the clarity of error messages for malformed datasets, and to handle filename formats like the ones you're using. I also discovered the renaming script in the documentation was outdated, so fixed that too.

I know this renaming is a bit clunky, I'm working on a more intelligent automated system of handling datasets in any format, but it's going to be a while before that gets finished.

hugheslavigne commented 4 years ago

Thank you for your quick reply!

With the new release, everything seems to work great with the server.

However, your python script to rename files doesn't seem to do what you intended. For example, let's suppose there are two files named USGS_13_n20w156.tif and USGS_13_n19w156.tif. If the n20w156 file gets renamed first, it will overwrite the other file. Then the script will call your new n19w156 file and renames it n18w156. So you'll end up with only one file. I had 14 files in my dataset and I ended up with 9 files after I ran the script.

You could fix this problem by either changing the filename to omit the prefix (i.e. USGS13) when you rename the file or by sorting the old_paths list. You could easily do this by changing your for loop initial statement to :

for old_path in sorted(old_paths):

By the way, your project is awesome!

ajnisbet commented 4 years ago

Oh noh, sorry you had to catch that issue! I modified the script to add 'renamed' to the new filenames: there are a few tiles in the full dataset below the equator that would have the same problem even when sorted, and it's nice to rename the files so you know they're not the same as the ones from USGS.

Glad it's working for you now!