AdamaJava / adamajava

Other
14 stars 5 forks source link

Qpileup.view.bug #309

Closed ChristinaXu2017 closed 2 years ago

ChristinaXu2017 commented 2 years ago

Description

qpileup view mode will create an CSV output, the first two lines looks like:

## Reference    Position        Ref_base        A_for   C_for   G_for   T_for   N_for   Aqual_for       Cqual_for       Gqual_for       Tqual_for       Nqual_for
       MapQual_for     ReferenceNo_for NonreferenceNo_for      HighNonreference_for    LowReadCount_for        A_rev   C_rev   G_rev   T_rev   N_rev   Aqual_rev       Cqual_rev       Gqual_rev       Tqual_rev       Nqual_rev       MapQual_rev     ReferenceNo_rev NonreferenceNo_rev      HighNonreference_rev    LowReadCount_rev
#Mesothelioma.HiSeqXTen.GRCh37.normal.DNA.h5

So the bug is that the "#Mesothelioma.HiSeqXTen.GRCh37.normal.DNA.h5” keeps getting written to the view file every one million rows - this breaks the file as a CSV. Code is updated and this line will only appear once and also moved to the new CSV header.

it would be good to have a header on these view mode output files, code is updated, the new header looks like:

# H5=Mesothelioma.HiSeqXTen.GRCh37.normal.DNA.h5
# REFERENCE=/mnt/lustre/reference/genomes/GRCh37_ICGC_standard_v2/primary_files/GRCh37_ICGC_standard_v2.fa
# BAMS_ADDED=64
# LOW_READ_COUNT=10
# MIN_NONREF_PERCENT=30

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

unit test is updated, tested on a real data set

Are WDL Updates Required?

no

Checklist:

holmeso commented 2 years ago

Could you please add in a test for PR #302 - thanks

holmeso commented 2 years ago

Could you please add in a test for PR #302 - thanks

Have you been able to do this?

holmeso commented 2 years ago

Would it be a good idea to create a test that writes out a million and one records and checks that only one header line is written to the output?

ChristinaXu2017 commented 2 years ago

I believe it is not needed. First I already tested on real data, this problem is solved. Second, a unit test is added for a new function. Thirdly, the current unit test is limited, it is too expensive for our small team to add unit tests for all existing code.

ChristinaXu2017 commented 2 years ago

If you consist on unit test, we may be able to create a regression test for qpileup, then we can use a small exon bam test on GRCh38 with multi-threads.