Sage-Bionetworks / synapsePythonClient

Programmatic interface to Synapse services for Python
https://www.synapse.org
Apache License 2.0
65 stars 67 forks source link

[SYNPY-1420] Re-write uploads to mix async, multi-threading, and multi-processing #1078

Closed BryanFauble closed 3 months ago

BryanFauble commented 3 months ago

Problem:

  1. The current upload process was slow as pieces that could be executed concurrently were executed sequentially.
  2. Items like MD5 calculation block the Python GIL from doing anything else.
  3. The v4.File class .store_async() and .store() methods needed to be re-written to use a new storage algorithm.
  4. The upload progress bar did not handle for concurrent file downloads.
  5. In some cases I was running out of memory when running on low-resource EC2 instances (1GB).

Solution:

  1. File part uploads are pushed to other threads
  2. MD5 checksum calculations are pushed to other processes (To prevent blocking the main AsyncIO Thread)
  3. All HTTP calls are using AsyncIO async/await syntax to prevent any blocking while we are waiting for network I/O
  4. Updated the progress bar to a new library called TQDM.
  5. Update the File model to use all Async Methods and the new upload methods. I also needed to extract out several methods from the client to accomplish this.
  6. Manually running garbage collection after a check with psutil to verify I can read a chunk of the file into memory.

Benchmarking: image-2

Mermaid diagram showing the various parts of the file upload process: pako eNq1Vktz2zYQ_isYHHqyTOtB2dZMm0PkxJmJG00sXRp2OBCxEjEmAQ4IxlYo_fcuQDIRKUrtpbwQWHz77RNLljRSHOiMbhL1GsVMG0KW80AHkuBTZIliPNyIBMKYSZ4AGQz-IGvYKA2D6rTB5sV6q1kW95-2EHORv5BP3hfyG3m_WB1DWrCnIjFisNAqgjwXctsF2ifl_reAv

File upload testing:

Integration tests:

Manual Testing: I uploaded a bunch of scripts to: https://sagebionetworks.jira.com/browse/SYNPY-1441 - That shows how I set up the location for all of these tests and the script I ran to create and then upload a file. The files were all uploaded to this Synapse Project: https://www.synapse.org/#!Synapse:syn54126908/files/ - The backing storage location (External to Synapse) will be deleted, so you are likely to not be able to re-create this unless you substitute in your external storage locations.

image

pep8speaks commented 3 months ago

Hello @BryanFauble! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 272:89: E501 line too long (98 > 88 characters) Line 273:89: E501 line too long (100 > 88 characters)

Line 2:89: E501 line too long (128 > 88 characters) Line 25:89: E501 line too long (99 > 88 characters) Line 27:89: E501 line too long (107 > 88 characters) Line 41:89: E501 line too long (118 > 88 characters) Line 50:89: E501 line too long (95 > 88 characters) Line 217:89: E501 line too long (108 > 88 characters) Line 555:89: E501 line too long (101 > 88 characters) Line 710:89: E501 line too long (90 > 88 characters) Line 711:89: E501 line too long (96 > 88 characters) Line 713:89: E501 line too long (102 > 88 characters) Line 840:89: E501 line too long (140 > 88 characters) Line 841:89: E501 line too long (98 > 88 characters) Line 845:89: E501 line too long (101 > 88 characters) Line 848:89: E501 line too long (104 > 88 characters) Line 851:89: E501 line too long (106 > 88 characters)

Line 38:89: E501 line too long (116 > 88 characters) Line 39:89: E501 line too long (116 > 88 characters) Line 86:89: E501 line too long (118 > 88 characters) Line 312:89: E501 line too long (89 > 88 characters) Line 314:89: E501 line too long (102 > 88 characters)

Line 88:89: E501 line too long (110 > 88 characters) Line 111:89: E501 line too long (110 > 88 characters)

Line 492:89: E501 line too long (91 > 88 characters) Line 684:89: E501 line too long (121 > 88 characters) Line 686:89: E501 line too long (103 > 88 characters) Line 708:89: E501 line too long (91 > 88 characters) Line 1142:89: E501 line too long (95 > 88 characters)

Line 69:89: E501 line too long (101 > 88 characters)

Line 751:89: E501 line too long (120 > 88 characters) Line 830:89: E501 line too long (94 > 88 characters)

Line 490:89: E501 line too long (120 > 88 characters) Line 564:89: E501 line too long (94 > 88 characters)

Comment last updated at 2024-03-15 21:38:31 UTC
sonarcloud[bot] commented 3 months ago

Quality Gate Passed Quality Gate passed

Issues
5 New issues
71 Accepted issues

Measures
0 Security Hotspots
90.7% Coverage on New Code
18.4% Duplication on New Code

See analysis details on SonarCloud