eastgenomics / Genetics_Ark

Django source code for Genetics Ark and associated apps
1 stars 0 forks source link

Fix script-stopping error that occurs when a project has BAMs but not… #89

Closed corbin-chris closed 2 months ago

corbin-chris commented 2 months ago

… BAIs, or vice versa.

Background find_dx_data.py makes a JSON which tells the Genetics Ark website which samples have available data in DNAnexus. It runs on a cron job.

I was getting this error when running find_dx_data.py: Traceback (most recent call last): File "/home/find_dx_data.py", line 475, in find_dx_bams(get_002_projects()) File "/home/find_dx_data.py", line 256, in find_dx_bams if project_bams and project_idxs: UnboundLocalError: local variable 'project_bams' referenced before assignment

The error occurs because it's possible to find BAMs in a project but NOT BAIs I presume this happens when a run is in progress, or occasionally for very old projects I suspect I typo'd the indent at some point during development

Remedial steps

Attached to a genetics-ark-cron running container, which was made from main 2.1.0 (a533e06) Made the same code changes to find_dx_data.py, as were made in this PR:

Exited and restarted: sudo podman restart genetics-ark-cron

The script runs on a quarter-hourly cron job, so I waited until 5pm, then looked at the output logs.

Results

We see that the script completes running instead of erroring out early:

tail -n 4 ga-cron.log

Searching for CNVs End JSON saved to: /home/jsons Execution time: 77.70393633842468 seconds 20240828-16:01:20 sample file updated

We see that the case which produced errors before, now just prints an info message and moves on without adding any info for that project to the JSON (GA v2.0.3 discarded them silently): tail -n 5000 ga-cron.log | grep "Either BAM"

Either BAMs or BAIs not available for project: project-a Either BAMs or BAIs not available for project: project-b Either BAMs or BAIs not available for project: project-c

I looked at these projects in DNAnexus: project-a isn't a normal analysis run - it only contains 3 files and is from 2021, contains 1 BAM and no BAI project-b contains a single BAM from 2021, but no BAI project-c is a reanalysis of an old sample, from 2021, and contains a BAI only


This change is Reviewable

pep8speaks commented 2 months ago

Hello @corbin-chris! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 275:80: E501 line too long (85 > 79 characters)