Closed parthshahva closed 5 years ago
Would it be possible to get a timeframe for one of these error message? For example the first one - when was this message received?
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://upload.integration.data.humancellatlas.org/v1/area/5b822e19-1858-4e50-a0a6-7d38bb7ad03a/links_76758683-91bd-44ca-93d1-36cd2c789a54.json
And to confirm - this was generated from a PUT /v1/<uuid>/<filename>
?
Ingest reported:
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://upload.integration.data.humancellatlas.org/v1/area/5b822e19-1858-4e50-a0a6-7d38bb7ad03a/links_76758683-91bd-44ca-93d1-36cd2c789a54.json
scripts/dcpdig -d integration @upload area=5b822e19-1858-4e50-a0a6-7d38bb7ad03a —show all
5b822e19-1858-4e50-a0a6-7d38bb7ad03a
exists (current status UNLOCKED).2019-01-28 13:27:04.475221+00:00
acd72804-294d-4985-a6a2-dbd53282df08
2019-01-28 13:41:31.085603+00:00
2019-01-28 13:42:20.734730+00:00
analysis_
2019-01-28 13:59:06.899805+00:00
2019-01-28 13:59:24.794433+00:00
put_file
, which does not generate notifications.The file links_76758683-91bd-44ca-93d1-36cd2c789a54.json
that generated the error does not have a file record.
Kibana says:
Searching from 2019-01-28 13:00:00
to 2019-01-28 15:59:59
for 5b822e19-1858-4e50-a0a6-7d38bb7ad03a/links_76758683-91bd-44ca-93d1-36cd2c789a54.json
:
4 hits total describing 2 requests:
@log_group:API-Gateway-Execution-Logs_iz05wpatc9/integration
@message: (eb91e82c-2304-11e9-af96-81bdf247f3fe) HTTP Method: GET, Resource Path: /v1/area/5b822e19-1858-4e50-a0a6-7d38bb7ad03a/links_76758683-91bd-44ca-93d1-36cd2c789a54.json
@timestamp:January 28th 2019, 13:59:25.060
@log_Group: API-Gateway-Execution-Logs_iz05wpatc9/integration
@message: (eb9a4ccf-2304-11e9-ba66-4d36ef5d7711) HTTP Method: PUT, Resource Path: /v1/area/5b822e19-1858-4e50-a0a6-7d38bb7ad03a/links_76758683-91bd-44ca-93d1-36cd2c789a54.json
@timestamp: January 28th 2019, 13:59:25.113
Interestingly that a GET happened before a PUT.
Searching for just links_76758683-91bd-44ca-93d1-36cd2c789a54.json
gets 10 hits, including stuff from the Lambda log:
@message:[INFO] 2019-01-28T13:59:25.128Z 559d00ca-b8df-48af-87b9-a65af3fc4633 Running <function put_file at 0x7f0bb7d81d90> with args=() kwargs={'upload_area_uuid': '5b822e19-1858-4e50-a0a6-7d38bb7ad03a', 'filename': 'links_76758683-91bd-44ca-93d1-36cd2c789a54.json', 'body':
Searching for that Lambda execution ID we see:
@message:[ERROR] 2019-01-28T13:59:25.177Z 559d00ca-b8df-48af-87b9-a65af3fc4633 Returning rfc7807 error response: status=404, title=No such file, detail=No such file in that upload area
@log_group:/aws/lambda/upload-api-integration
@timestamp:January 28th 2019, 13:59:25.177
This error can be generated from two places:
UploadedFile._s3_load
DssChecksums.Tagger._read_tags
I think… eventual consistency. put_file
calls UploadedFile.create()
which does a s3object.put()
to create thre acutal S3 object, then immediately calls UploadedFile.init()
which attempts to read back from that object. If the write and read calls, that happen within milliseconds of each other hit different S3 handlers, I could see a case where the object wasn’t yet consistent across all the S3 handers.
Recommendation: we add an @retry around UploadedFile._s3_load
I second this recommendation. I think this is the cleanest solution to account for the eventual consistency