Background
find_dx_data.py makes a JSON which tells the Genetics Ark website which samples have available data in DNAnexus. It runs on a cron job.
I was getting this error when running find_dx_data.py:
Traceback (most recent call last):
File "/home/find_dx_data.py", line 475, in
find_dx_bams(get_002_projects())
File "/home/find_dx_data.py", line 256, in find_dx_bams
if project_bams and project_idxs:
UnboundLocalError: local variable 'project_bams' referenced before assignment
The error occurs because it's possible to find BAMs in a project but NOT BAIs
I presume this happens when a run is in progress, or occasionally for very old projects
I suspect I typo'd the indent at some point during development
Remedial steps
Attached to a genetics-ark-cron running container, which was made from main 2.1.0 (a533e06)
Made the same code changes to find_dx_data.py, as were made in this PR:
Indent the block on line 256, which starts with 'if project_bams and project_idxs:'
Added an 'else' line which prints '(f"Either BAMs or BAIs not available for project: {project_id}")'
Exited and restarted: sudo podman restart genetics-ark-cron
The script runs on a quarter-hourly cron job, so I waited until 5pm, then looked at the output logs.
Results
We see that the script completes running instead of erroring out early:
tail -n 4 ga-cron.log
Searching for CNVs End
JSON saved to: /home/jsons
Execution time: 77.70393633842468 seconds
20240828-16:01:20 sample file updated
We see that the case which produced errors before, now just prints an info message and moves on without adding any info for that project to the JSON (GA v2.0.3 discarded them silently):
tail -n 5000 ga-cron.log | grep "Either BAM"
Either BAMs or BAIs not available for project: project-a
Either BAMs or BAIs not available for project: project-b
Either BAMs or BAIs not available for project: project-c
I looked at these projects in DNAnexus:
project-a isn't a normal analysis run - it only contains 3 files and is from 2021, contains 1 BAM and no BAI
project-b contains a single BAM from 2021, but no BAI
project-c is a reanalysis of an old sample, from 2021, and contains a BAI only
… BAIs, or vice versa.
Background find_dx_data.py makes a JSON which tells the Genetics Ark website which samples have available data in DNAnexus. It runs on a cron job.
I was getting this error when running find_dx_data.py: Traceback (most recent call last): File "/home/find_dx_data.py", line 475, in
find_dx_bams(get_002_projects())
File "/home/find_dx_data.py", line 256, in find_dx_bams
if project_bams and project_idxs:
UnboundLocalError: local variable 'project_bams' referenced before assignment
The error occurs because it's possible to find BAMs in a project but NOT BAIs I presume this happens when a run is in progress, or occasionally for very old projects I suspect I typo'd the indent at some point during development
Remedial steps
Attached to a genetics-ark-cron running container, which was made from main 2.1.0 (a533e06) Made the same code changes to find_dx_data.py, as were made in this PR:
Exited and restarted: sudo podman restart genetics-ark-cron
The script runs on a quarter-hourly cron job, so I waited until 5pm, then looked at the output logs.
Results
We see that the script completes running instead of erroring out early:
tail -n 4 ga-cron.log
Searching for CNVs End JSON saved to: /home/jsons Execution time: 77.70393633842468 seconds 20240828-16:01:20 sample file updated
We see that the case which produced errors before, now just prints an info message and moves on without adding any info for that project to the JSON (GA v2.0.3 discarded them silently): tail -n 5000 ga-cron.log | grep "Either BAM"
Either BAMs or BAIs not available for project: project-a Either BAMs or BAIs not available for project: project-b Either BAMs or BAIs not available for project: project-c
I looked at these projects in DNAnexus: project-a isn't a normal analysis run - it only contains 3 files and is from 2021, contains 1 BAM and no BAI project-b contains a single BAM from 2021, but no BAI project-c is a reanalysis of an old sample, from 2021, and contains a BAI only
This change is