USGS-WiM / StreamStatsServices

StreamStats REST Services
https://streamstats.usgs.gov/streamstatsservices
Other
1 stars 0 forks source link

Extend timeout period for services #65

Closed amedenblik closed 2 years ago

amedenblik commented 2 years ago

Users are reporting new issues where basin characteristics are sometimes not calculating for the New York StreamStats application. This typically occurs when many basin characteristics, or basin characteristics of a certain type that take a long time to calculate, are selected. It does not appear to be an underlying data issue, and the StreamStatsServices have not been updated recently.

We suspect the services are taking too long to calculate the basin characteristics and are timing out and returning error 500. If possible, can we try extending the timeout period to see if that fixes the issue?

Example service call that fails sometimes: https://streamstats.usgs.gov/streamstatsservices/parameters.json?rcode=NY&workspaceID=NY20220107154429698000&includeparameters=BSLOPCM,CENTROIDX,CENTROIDY,CONTOUR,CSL1085LO,CSL1085UP,CSL10_85,DRNAREA,EL1200,FOREST,JULAVPRE,JUNAVPRE,JUNMAXTMP,LAGFACTOR,LC11DEV,LC11IMP,LENGTH,MAR,MAYAVPRE,MXSNO,OUTLETX,OUTLETY,PRECIP,PRJUNAUG00,SLOPERATIO,SSURGOA,SSURGOB,STORAGE

aaronstephenson commented 2 years ago

I increased the timeout from 120 to 500 seconds in IIS on both Prod Web servers. I still get the 500 error immediately when I click that link, tho.

When I do a simpler query like https://streamstats.usgs.gov/streamstatsservices/parameters.json?rcode=NY it responds immediately with JSON output. So maybe either the web server or the code doesn't like the long list of params?

aaronstephenson commented 2 years ago

No I guess it's not the length of the params per se, since IIS by default allows 2048 chars and that query only has 297. So I'm guessing the code doesn't like something in that param list... maybe one of the param names is wrong or unhandled in the code and so it errors out? Wild guess.

amedenblik commented 2 years ago

Updated testing link: https://prodwebb.streamstats.usgs.gov/streamstatsservices/parameters.json?rcode=NY&workspaceID=NY20220107221806002000&includeparameters=BSLOPCM,CENTROIDX,CENTROIDY,CONTOUR,CSL1085LO,CSL1085UP,CSL10_85,DRNAREA,EL1200,FOREST,JULAVPRE,JUNAVPRE,JUNMAXTMP,LAGFACTOR,LC11DEV,LC11IMP,LENGTH,MAR,MAYAVPRE,MXSNO,OUTLETX,OUTLETY,PRECIP,PRJUNAUG00,SLOPERATIO,SSURGOA,SSURGOB,STORAGE

harper-wavra commented 2 years ago

Thanks for increasing the timeout. I think the reason we were seeing the 500 error immediately is because when just clicking on that first URL, prodweba or prodwebb is not specified, so if it used the wrong server for the request the data that is needed to complete the request is not present on the server so it errors out immediately.

We are still seeing 500 errors after about 30 seconds when the request is configured correctly though. There are a couple reason we believe it not to be an issue with the code (but that doesn't mean it isn't.) For one, we haven't updated the StreamStatsServices code or the NY data at all, and this issue seems to have just popped up in the last week. The second reason is that sometimes the same request will work if you try again. I have had this happen to me both while using the client and just by using the URL request. It also always seems to work on dev and more frequently work on test, which is what lead us to believe it was a timeout error and there were maybe too many people making requests.

If you think or know of any reason for these requests to be working sometimes and not others, we would love to hear them as we are pretty stumped.

aaronstephenson commented 2 years ago

Re: prodweba vs probwebb, the load balancer should have a "sticky" setting enabled so that a user's requests always go to the same server within a specificed time period (I think we set 24 hours).

The only change that might have happened on the server side in the last couple weeks is the encryption of the hard drives, but that shouldn't affect anything.

amedenblik commented 2 years ago

Closing this issue. The 500 errors users were seeing in NY seem to have been caused by tasks in the task scheduler that were not working properly after we attempted to move them off KJ's user account. These tasks cleaned out data temporary files. Please keep KJ's user account active on the StreamStats machines until we get them correctly moved to the ScriptRunner account.