eastgenomics / Genetics_Ark

Django source code for Genetics Ark and associated apps
1 stars 0 forks source link

Speed up dx #78

Closed corbin-chris closed 4 months ago

corbin-chris commented 4 months ago

Changes: Parallelising dxpy searches and removing redundant API calls Closes #73 #62

Test set-up: Ran latest-version genetics-ark-cron code (with jq manually installed), and looked at outputs of the find_dx_data.py Compared to the output from find_dx_data.py when running genetics-ark-cron:speed_up_dx, keeping the rest of the containers the same.

Test results: Updated code runs 10x faster (on comparison run, 65.9 seconds versus 1100.8 seconds) Nothing gets output to the cron error logs Both the updated code and the current code, find the same number of file entries (cat dx_002_bams.json | jq '.BAM[][].idx_name' | wc), and no empty indices with 'jq '.BAM[][].idx_name'' The dx_002_bams.json file is the same size for updated and current code. dx_missing_bam.json is trivially different (584623 for new, 584149 for old).


This change is Reviewable

pep8speaks commented 4 months ago

Hello @corbin-chris! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 147:80: E501 line too long (82 > 79 characters) Line 232:46: W605 invalid escape sequence '.' Line 232:56: W605 invalid escape sequence '.' Line 371:45: W605 invalid escape sequence '\Z' Line 371:78: W605 invalid escape sequence '\Z' Line 371:80: E501 line too long (81 > 79 characters) Line 408:80: E501 line too long (84 > 79 characters) Line 410:80: E501 line too long (87 > 79 characters)

Comment last updated at 2024-06-20 14:08:01 UTC