HydrologicEngineeringCenter / Vortex

data processing utilities
MIT License
26 stars 7 forks source link

GeoTiff raster stack processing #51

Closed steezbert closed 2 years ago

steezbert commented 2 years ago

Hey all,

I am working with high temporal resolution precipitation raster data time series and was wondering if there is a way to process raster stacks with vortex in batch mode. I am currently using ASCII files and the HEC asc2dssGrid.exe and vortex to get my gridded data into the HEC DSS format. I also figured out how to use vortex with GeoTiff files. However both methods are a huge bottleneck time and storage wise, as I have a lot of files (5 min resolution with a period of 10+ years). Being able to use GeoTiff raster stacks instead of single files would be a huge help. So I was wondering if there is a way to process GeoTiff raster stacks with vortex to write the data into a DSS file, using layer names or lists to set the D and E file path parts.

Additionally it would be nice if there would be some more documentation (sorry if I did not see it) about what write options are there in batch mode for vortex. Looking through the code I figured out how the set all parts for the DSS internal file path and how to set the data type and data units. However I could not figure out yet how to set the start and end date for the grid record itself.

I also have troubles using one single python script to process the files from a folder using a list of file names. If I just give vortex the list of file names I can process them. This works but the entries overwrite themself.

`

from mil.army.usace.hec.vortex.io import BatchImporter
from mil.army.usace.hec.vortex.geo import WktFactory

from glob import glob
from os import path

in_files = glob(r'D:/PATH/TEST/*.tif')

#variable name = file name #file names look like this "RADOLAN_20170630_1435.tif"
variables = [path.basename(file) for file in in_files]
variables = [str.split(var, ".")[0] for var in variables]

destination = 'D:/PATH/RADOLAN.dss'

write_option = {
    'partA': 'UTM33N',
    'partB': 'Sachsen_Kemnitz',
    'partC': 'PRECIPITATION',
    'partD': '01JAN2017:1500',
    'partE': '01JAN2017:1505',
    'partF': 'RADOLAN,
    'dataType': 'PER-CUM',
    'units': 'MM'
}

myImport = BatchImporter.builder() \
    .inFiles(in_files) \
    .variables(variables) \
    .destination(destination) \
    .writeOptions(write_options) \
    .build()

myImport.process()

`

I could not find a working way to give vortex different write options for every file listed. Is there a way in batch mode to give vortex a list of files and a list of write options? Could not make this work.

`

from mil.army.usace.hec.vortex.io import BatchImporter
from mil.army.usace.hec.vortex.geo import WktFactory

from glob import glob
from os import path
from datetime import datetime, timedelta

in_files = glob(r'D:/PATH/TEST/*.tif')

#variable name = file name #file names look like this "RADOLAN_20170630_1435.tif"
variables = [path.basename(file) for file in in_files]
variables = [str.split(var, ".")[0] for var in variables]

#get start and stop time from file name
#time stamp as datetime
#time_stamp_time = datetime.strptime(variables, "TS_%Y%m%d_%H%M.tif")
time_stamp_time = [datetime.strptime(variable, "TS_%Y%m%d_%H%M.tif") for variable in variables]
#start time as datetime
start_time_time = [time - timedelta(minutes=5) for time in time_stamp_time]
#time stamp as formatted string
time_stamp = [tst.strftime("%d%b%Y:%H%M").upper() for tst in time_stamp_time]
#start time as formatted string
start_time = [stt.strftime("%d%b%Y:%H%M").upper() for stt in start_time_time]

#adjust time stamp to 2400 the day before if it is 0000
date = [str.split(st, ":")[0] for st in start_time]
time = [str.split(ts, ":")[1] for ts in time_stamp]
time = ["2400" if t=="0000" else t for t in time]
time_stamp_dss = [d + ":" + t for d,t in zip(date, time)]

destination = 'D:/PATH/RADOLAN.dss'

write_options = []

for i in range(0, len(time_stamp_dss)):
    write_option = {
        'partA': 'UTM33N',
        'partB': 'Sachsen_Kemnitz',
        'partC': 'PRECIPITATION',
        'partD': start_time[i],
        'partE': time_stamp_dss[i],
        'partF': 'RADOLAN',
        'dataType': 'PER-CUM',
        'units': 'MM'
    }
    write_options.append(write_option)

myImport = BatchImporter.builder() \
    .inFiles(in_files) \
    .variables(variables) \
    .destination(destination) \
    .writeOptions(write_options) \
    .build()

myImport.process()

`

I am thankful for any help!

tombrauer commented 2 years ago

Hello @steezbert - thanks for reaching out.

Vortex can read ASC and GeoTIFF rasters. For a few commonly used data sources, e.g. PRISM and QPF, we have added parsing logic that extracts start and end times from the file name and applies them to the DSS record that is generated. For non-standardized datasets, as yours appears to be, you will have to implement logic to explicitly set start time and end time for each gridded record, as you have done in your scripts.

I did some testing on your single-import script and everything worked to my expectations. From looking at your batch import script, I am not sure that import is happening inside of the for loop. If you would like to send a few of your *.tif files for me to test, I will take a look.

I like your suggestion for more documentation on the write options that are available. I will try to add documentation that covers the options in more detail.

steezbert commented 2 years ago

Hey @tombrauer - thanks for your quick reply.

Putting the import into the loop might help. I will do some more testing tomorrow, it's been a long day. Attached you find some single layer GeoTiffs and one raster stack GeoTiff for testings.

Thanks for your help!

Test_Data.zip

EDIT: Forgot to mention. I am pre processing the data with R and can name the files what ever I want.

danhamill commented 2 years ago

@steezbert I believe the batch processing jython tools are designed to populate part D and partE for you. If you specify those values from your jython script, it makes sense it will overwrite each record. I use write_options to specify static portions of the dss path names (e.g., partA, partB, partC, partF, units, dataType).

From testing I have found file paths in this structure will be correctly parsed by vortex

precip_<beginTime>_<endTime>.tif

where beginTime and endTime is formatted following ISO format.

I tried other prefixes, but have only had success with precip. You can set partC in write_options for the expected HMS variable name.

Since I saw your issue from pydsstools, I also want to note you need to initialize a version 6 output dss file before you process with vortex. Vortex will create a version 7 file if it does already exist.

steezbert commented 2 years ago

Hey @danhamill,

thank you for the tip with the file names! I will test that later. Found out how to use write options to set every pathname:

`

from mil.army.usace.hec.vortex.io import BatchImporter
from mil.army.usace.hec.vortex.geo import WktFactory
from mil.army.usace.hec.vortex import Options

from glob import glob
from os import path
from datetime import datetime, timedelta

in_files = glob(r'D:/PATH/*.tif')

#variable name = file name #file names look like this "TS_20170630_1435.tif"
variables = [path.basename(file) for file in in_files]
variables = [str.split(var, ".")[0] for var in variables]

destination = 'D:/PATH/RADOLAN.dss'

#time stamp as datetime
time_stamp_time = [datetime.strptime(variable, "TS_%Y%m%d_%H%M") for variable in variables]
#start time as datetime
start_time_time = [time - timedelta(minutes=5) for time in time_stamp_time]
#time stamp as formatted string
time_stamp = [tst.strftime("%d%b%Y:%H%M").upper() for tst in time_stamp_time]
#start time as formatted string
start_time = [stt.strftime("%d%b%Y:%H%M").upper() for stt in start_time_time]

#adjust time stamp to 2400 the day before if it is 0000
date = [str.split(st, ":")[0] for st in start_time]
time = [str.split(ts, ":")[1] for ts in time_stamp]
time = ["2400" if t=="0000" else t for t in time]
time_stamp_dss = [d + ":" + t for d,t in zip(date, time)]

for i in range(0, len(variables)):
    write_options = Options.create()
    write_options.add('partA', 'UTM33N')
    write_options.add('partB', 'Sachsen_Kemnitz')
    write_options.add('partC', 'PRECIPITATION')
    write_options.add('partD', start_time[i])
    write_options.add('partE', time_stamp_dss[i])
    write_options.add('partF', 'RADOLAN')
    write_options.add('dataType', 'PER-CUM')
    write_options.add('units', 'MM')

    in_file = [in_files[i]]
    variable = [variables[i]]

    myImport = BatchImporter.builder() \
        .inFiles(in_file) \
        .variables(variable) \
        .destination(destination) \
        .writeOptions(write_options) \
        .build()

    myImport.process()

`

Batching GeoTiffs works fine now, so thank you for all the help. Only thing even better would be raster stack GeoTiff functionality, maybe one day.

danhamill commented 2 years ago

+1 for the suggestion to allow for processing a list of raster's using a list of write options. I like the idea of being able to specify each grid path name with out having to find a way to have vortex determine the timestamps.

The code you shared imports on a grid-by-grid basis, which requires a JVM instance for grid. Vortex is designed to read all grids into memory, then process them in parallel.

tombrauer commented 2 years ago

@steezbert I added Batch Import Options documentation that enumerates all of the geoprocessing and write options available when using the batch importer. I also added Raster Filename Parsing documentation that describes what to expect from filename parsing, including date parsing.

I created a new release v0.10.25 that should simplify the logic for your *.tif import example. Namely, you don't have to pass the variable names argument and you don't have to provide partD and partE args provided you follow the date conventions in the documentation linked above.

The script I tested looks like this:

from mil.army.usace.hec.vortex.io import BatchImporter
from mil.army.usace.hec.vortex.geo import WktFactory

from glob import glob

in_files = glob(r'C:/Temp/Test_Data/*.tif')

destination = 'C:/Temp/import.dss'

geo_options = {
    'targetCellSize': '2000',
    'targetEpsg': '32633'
}

write_options = {
    'partA': 'UTM33N',
    'partB': 'Sachsen_Kemnitz',
    'partC': 'PRECIPITATION',
    'partF': 'RADOLAN',
    'dataType': 'PER-CUM',
    'units': 'MM'
}

myImport = BatchImporter.builder() \
    .inFiles(in_files) \
    .destination(destination) \
    .geoOptions(geo_options) \
    .writeOptions(write_options) \
    .build()

myImport.process()
tombrauer commented 2 years ago

@danhamill As of v0.10.25 you shouldn't have to specify "precip" in the file name to get the dates to import properly. I tested and verified. If you can test as well, I would appreciate it!

steezbert commented 2 years ago

@tombrauer thank you very much for the update!

I tested the new version with different date formats and prefixes and could not find any issues. Not needing a variable list is nice as well. All this makes data processing so much easier!

Another improvement would be the option to define the version of the DSS file created by vortex. As @danhamill noted vortex creates only version 7 DSS files. But sometimes you need version 6 DSS files like in HEC-HMS, at least that is what I experienced.

The last improvement I can suggest and mentioned earlier is raster stack GeoTiff processing, maybe one day ;)

The updated documentation is really nice as well. This should help others in understanding and using vortex a lot.

Thank you again for your help and great work with vortex!

tombrauer commented 2 years ago

@steezbert sure thing! A note on DSS - if the file does not exist previously, vortex will create it as DSS7. If you would like for the file to be DSS6, you can create the DSS file as version 6 prior to the vortex write, and the grids will be written to the DSS6 file. In theory HMS should run with 6 or 7 files. We plan to do more thorough testing this year before calling the DSS 6 to 7 migration final.