Ashar25 / RainyDay

0 stars 1 forks source link

"Processing file..." step takes a long time with big transposition domain #44

Open lassiterdc opened 9 months ago

lassiterdc commented 9 months ago

I am running RainyDay with 5-minute MRMS data over a much larger transposition domain (2,573 square km vs. 159,877 square km). I added some print statements to estimate the needed runtime and to find the bottleneck. It appears that it'd take about 7.3 hours per day of input data to build the storm catalog, so even dividing the job up by year, it would take 112 days to run.

The bottleneck appears to occur in the following line.

rainmax,ycat,xcat=RainyDay.catalogNumba_irregular(temparray,trimmask,xlen,ylen,maskheight,maskwidth,rainsum,domainmask)

I will be seeing if I can find ways of increasing the efficiency of this function with a goal of getting the runtime down to 5 days which I hope is possible. If increasing efficiency is something of interest to you or is something you're already working on, I'd love to bounce ideas here!

One thought I had was to perform event selection using MRMS data consolidated to an hourly timestep, but then using the latitude, longitude, and time indices to select to subset the full resolution data for building the catalog. This is probably something that could be done within the script itself without complicating it too much.

lassiterdc commented 8 months ago

FYI I resolved this by doing event selection using the hourly data, then using the resulting storm catalog to subset the original full resolution rainfall data. This brought runtimes back down to a reasonable time for my purposes - around 2-3 days. I was able to then run RainyDay to generate scenarios using the full resolution storm catalog.