broadinstitute / gatk-sv

A structural variation pipeline for short-read sequencing
BSD 3-Clause "New" or "Revised" License
170 stars 70 forks source link

UnboundLocalError in src/sv-pipeline/scripts/make_evidence_qc_table.py #673

Closed CuriousTim closed 4 months ago

CuriousTim commented 5 months ago

Bug Report

Affected module(s) or script(s)

src/sv-pipeline/scripts/make_evidence_qc_table.py This script is run as part of the EvidenceQC workflow.

Affected version(s)

Description

A variable is referenced before it is assigned to resulting in a UnboundLocalError and the following stacktrace:

Traceback (most recent call last):
  File "/opt/sv-pipeline/scripts/make_evidence_qc_table.py", line 274, in <module>
    main()
  File "/opt/sv-pipeline/scripts/make_evidence_qc_table.py", line 258, in main
    merge_evidence_qc_table(
  File "/opt/sv-pipeline/scripts/make_evidence_qc_table.py", line 181, in merge_evidence_qc_table
    df_total_high_outliers = read_all_outlier(df_manta_high_outlier, df_melt_high_outlier, df_wham_high_outlier, "high")
  File "/opt/sv-pipeline/scripts/make_evidence_qc_table.py", line 152, in read_all_outlier
    all_outliers_df.columns = [ID_COL, outlier_type + "_overall_outliers"]
UnboundLocalError: local variable 'all_outliers_df' referenced before assignment

The offending code attempts to set the column names of a pandas DataFrame before creating it.

if len(all_outliers) == 0:
    all_outliers_df = pd.DataFrame(columns=[ID_COL, outlier_type + "_overall_outliers"])
else:
    all_outliers_df.columns = [ID_COL, outlier_type + "_overall_outliers"]
    all_outliers_df = pd.DataFrame.from_dict(all_outliers, orient="index").reset_index()
return all_outliers_df

Steps to reproduce

I don't know what this script does, but I am guessing if you run it on an input with outliers, it will error.

Expected behavior

The script should run without an UnboundLocalError.

Actual behavior

The script exits with an UnboundLocalError.


CuriousTim commented 4 months ago

Fixed by #674