USGS-WiM / STNServices2

Web services for the Short-Term Network (STN) database
Other
0 stars 0 forks source link

STN S3 hosted files appear 'gone' intermittently #154

Closed lprivette closed 3 years ago

lprivette commented 3 years ago

This is the same thing we encountered in November: https://wim-usgs.slack.com/archives/CTX3JNE90/p1604703009122900

We've had multiple users over the past two weeks reach out about this so we need to get to the bottom of it.

The files are missing intermittently, sometimes gone for a few hours before they come back. If I log into s3 to view them, they aren't even listed in their folders.

I'm going to run the services locally and try to verify some things

aaronstephenson commented 3 years ago

I'm not aware of anything in S3 that would do this seemingly "automatic" disappearance-reappearance (policies, etc), it's very puzzling.

There are no lifecycle rules for this bucket, so there are no "automatic" deletions or modifications or transfers.

Versioning is enabled, but I can't think of how that would do anything like this (you would have to manually or programmatically tell S3 to delete or restore objects, which I would guess is not something that is in the STN code, and it's certainly not something you or the users are doing manually).

Replication is enabled, but that's set up to be one-way (stn2storage to stn2storage-backup (in the US-West-2 region)).

You can reach out to Z for help, but first take a close look at the STN code; maybe use breakpoints at all the S3 transactions in the code (POST/PUT/GET/DELETE) and then monitor the S3 folder where those files are supposed to be.

lprivette commented 3 years ago

Thanks Aaron. It's so frustrating without steps to reproduce this consistently.

This may be a dumb question, but when I'm testing the stn code do you think it important for me to test locally from TOAD or just on my local machine?

ChadFanguy-usgs commented 3 years ago

500 Error Logs for IIS on 11/6: 2020-11-06 16:06:50 10.0.5.221 GET /STNServices/Files/81745/Item 1604678822371 443 - IP Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/86.0.4240.111+Safari/537.36 https://stn.wim.usgs.gov/STNWeb/ 500 0 0 202

And today (3/15): 2021-03-15 18:56:23 10.0.5.221 GET /STNServices/Files/fdsf.json - 443 - IP Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/88.0.4324.182+Safari/537.36+Edg/88.0.705.81 https://stn.wim.usgs.gov/STNServices/Documentation/File/AFile 500 0 0 78

(IP's replaced with "IP")

ChadFanguy-usgs commented 3 years ago

"z_unauthorized_api_calls_metric" in CloudWatch on 3/15 with similar time to 1 set of the 500's

And other services reporting 200 OK statuses.

lprivette commented 3 years ago

Any news from Z on this?

aaronstephenson commented 3 years ago

We haven't reached out to Z yet. @HansVraga @fanguyc-usgs let's talk on Monday when Chad is back.

aaronstephenson commented 3 years ago

I emailed Zivaro yesterday and CC'd Chad, so he can continue the email thread with them.

ChadFanguy-usgs commented 3 years ago

"Failed to include item: EVENTS/EVENT_305/SITE_23291 exception: A WebException with status ReceiveFailure was thrown." reported as error. Going to look into AWS SDK update and IIS configurations.

ChadFanguy-usgs commented 3 years ago

The issue was the SSL version used between AWS and IIS. It was using out of date algorithms because older .Net Framework applications do not use newer versions of TLS unless specified in Web.config or code. Specific error was: "System.ComponentModel.Win32Exception: The client and server cannot communicate, because they do not possess a common algorithm"

One fix is adding targetFramework to httpRuntime in the web.config like below: <httpRuntime targetFramework="4.7.2" />

Will monitor application for a bit before closing issue to make sure this works.

lprivette commented 3 years ago

I've checked in with the STN User Group this morning asking if they've seen this issue since Friday. No one responded so considering this closed!