USGS-R / rloadest

USGS water science R functions for LOAD ESTimation of constituents in rivers and streams.
Other
20 stars 19 forks source link

Running selBestModel or loadreg by "water year" extremely slow, R session must be killed to exit #9

Open AllisonOliver opened 8 years ago

AllisonOliver commented 8 years ago

This is similar in theme to the other active issue posted right now.. I am running rloadest with selBestModel (and I have tried with loadreg) to calculate average daily load by "water year". I have no problems doing this by "month" or by "day" (takes about 5-8 minutes). I have run two successfully, each taking about 20 minutes... however, the last three I have tried to run have run for about 3 to 8 hours.. and I receive the error: "Error: Unable to establish connection with R session" before it eventually finishes. I have one run that never went to completion (it ran 15 hours) and I finally killed the R session to exit.

These files are, theoretically, all the same (just different watersheds, files are the same for each) and the models being run are also very similar (all model 4 or 6). Any idea why this would be happening or what I might do to calculate this a different way?

Thanks for your help!

dlorenz-usgs commented 8 years ago

The time required for predicting loads is dependent on the square of the number of data points that must be processed to complete the entire time frame. For a daily time step, computing water-year required only 365 or 366 individual data points. For an instantaneous time step, it takes 100 time the number of data points and at least 10,000 times longer to compute.

The time required can also be very dependent on extrapolations, when convergence of the variance estimate is extremely slow. This often happens when users substitute a very small value, say .01 or 0.001, for 0 flows. Zero flows are correctly handled in recent version of rloadest.

On Fri, Feb 26, 2016 at 10:43 AM, AllisonOliver notifications@github.com wrote:

This is similar in theme to the other active issue posted right now.. I am running rloadest with selBestModel (and I have tried with loadreg) to calculate average daily load by "water year". I have no problems doing this by "month" or by "day" (takes about 5-8 minutes). I have run two successfully, each taking about 20 minutes... however, the last three I have tried to run have run for about 3 to 8 hours.. and I receive the error: "Error: Unable to establish connection with R session" before it eventually finishes. I have one run that never went to completion (it ran 15 hours) and I finally killed the R session to exit.

These files are, theoretically, all the same (just different watersheds, files are the same for each) and the models being run are also very similar (all model 4 or 6). Any idea why this would be happening or what I might do to calculate this a different way?

Thanks for your help!

— Reply to this email directly or view it on GitHub https://github.com/USGS-R/rloadest/issues/9.

AllisonOliver commented 8 years ago

Ah... ok, that clarifies things quite a bit. So I should just expect very slow times, and the run will eventually complete? Great to know, thank you.

dlorenz-usgs commented 8 years ago

There is an argument to predLoad called seopt. The default is "exact," which forces the exact computation of all of the variances and covarainces and takes along time in this case. The alternative is "approximate," which does a linear approximation to the total variance and is much faster.

On Fri, Feb 26, 2016 at 12:00 PM, AllisonOliver notifications@github.com wrote:

Ah... ok, that clarifies things quite a bit. So I should just expect very slow times, and the run will eventually complete? Great to know, thank you.

— Reply to this email directly or view it on GitHub https://github.com/USGS-R/rloadest/issues/9#issuecomment-189397919.

AllisonOliver commented 8 years ago

Great to know, yes.. it is much faster! I am looking for exact computation so I will just let it take it's time, but very useful for future endeavours. Thanks again.