dnanexus-archive / parliament2

Runs a combination of tools to generate structural variant calls on whole-genome sequencing data
Apache License 2.0
102 stars 39 forks source link

Breakseq2 for hg38-based bam files #45

Open MaestSi opened 5 years ago

MaestSi commented 5 years ago

Dear Samantha, as for running breakseq2 on reference genomes different to hs37d5 (I am interested in hg38 in particular), I would like to share with you a few hints, as this may be of help to fix this bug in a future release (if you haven't already thought about it but you have planned to). As for the breakpoint library specified with --bplib_gff, I used a .gff file breakseq2_bplib_20150129_hg38.gff, obtained doing a liftover of the original hg19-based library (I downloaded it from SVE Github page). At the beginning, when trying to use pysam==0.7.7 as suggested, it gave some errors since it didn't expect the header of the bam file to contain some chromosomes labelled as 'AH:*', namely alternative haplotypes. Then, I tried using a newer version of pysam (0.9.0) and that error disappeared; however, when running breakseq2, after a few minutes, I got error "OSError: [Errno 7] Argument list too long". The only way I found to solve it, in order to obtain a vcf output, was to restrict the analysis to the canonical chromosomes with parameter --chromosomes chr1 chr2 chr3 etc... I don't know if this might be of help, but I thought I could share this with you.

slzarate commented 5 years ago

Thank you for the suggestion! I have been thinking about this and plan on integrating it into the next release.