GlobalPathogenAnalysisService / gpas-cli

The CLI client for GPAS SC2
Other
5 stars 2 forks source link

gpas-cli v0.5.1 - it is no longer possible to perform bulk upload of identical samples (batch upload testing) #82

Closed KuzminaAnna closed 1 year ago

KuzminaAnna commented 1 year ago

While trying to upload 200 regular ONT samples / 84 bam ONT samples, the upload failed with the following message "issues found on the PII stripout & quality check process" in EC v1.1.3. When performing the same operation in gpas-cli exactly the following exception is thrown: { "exception": "ReadTimeout('The read operation timed out')", "traceback": [ " File \"gpas/misc.py\", line 115, in jsonify_exceptions\n", " File \"gpas/cli-upload.py\", line 40, in upload\n", " File \"gpas/lib.py\", line 868, in upload\n", " File \"gpas/lib.py\", line 779, in _submit\n", " File \"gpas/lib.py\", line 751, in _finalise_submission\n", " File \"httpx/_api.py\", line 304, in post\n", " File \"httpx/_api.py\", line 100, in request\n", " File \"httpx/_client.py\", line 815, in request\n", " File \"httpx/_client.py\", line 902, in send\n", " File \"httpx/_client.py\", line 930, in _send_handling_auth\n", " File \"httpx/_client.py\", line 967, in _send_handling_redirects\n", " File \"httpx/_client.py\", line 1003, in _send_single_request\n", " File \"httpx/_transports/default.py\", line 217, in handle_request\n", " File \"contextlib.py\", line 153, in exit\n", " File \"httpx/_transports/default.py\", line 77, in map_httpcore_exceptions\n" ] } Screen Shot 2022-10-05 at 6 13 57 PM-1 Screen Shot 2022-10-05 at 6 14 41 PM

bede commented 1 year ago

Thanks Anna. Looks like this may be related to the transition to httpx from python-requests, however I cannot replicate the issue with OUH069, I just tried

KuzminaAnna commented 1 year ago

The comment from John Cantalupo on the root cause of the issue: Screen Shot 2022-10-13 at 11 33 56 AM

Hi guys, first, one change that was introduced last sprint is that sample name must be unique, and that is what is causing the problem to be seen here. The fact that the samples all have the same hash is not a problem on the apex side. When the uploader calls the /ords/grep/electron/createSampleGuids API, that will generate a unique sample guid for each sample regardless of whether or not the hashes are duplicated. I see that in our EXPECTED_SAMPLES table for this example (see screenshot). So nine distinct sample guids should've been passed back to the uploader. The problem seems to be that when the batch json was pushed up to gpas (/ords/grep/electron/batches) all nine samples in the json had the same guid - db7d8904-9636-8504-3aa2-ea76cd5dd708. So that's where the new unique key gets violated and the error is thrown. So I suspect the problem is in the uploader... it must be getting the response back from /ords/grep/electron/createSampleGuids and assigning the sample guids by hash, but because of the duplicate hash, each sample gets the same guid (as seen in the mapping csv).

bede commented 1 year ago

I was finally able to reproduce this bug 🙌 Thanks @KuzminaAnna (and John)

bede commented 1 year ago

Fixed in https://github.com/GlobalPathogenAnalysisService/gpas-cli/commit/748d44e3ecfa7445e72ba9520c1113247ae63807