This is the one that takes a long time in SGA correction and filtering, only to have SGA filter all of the reads and therefore blow up.
Is suspicious in that it's on the Y for NA12878 (there are quite a few Y intervals actually).
@tedsharpe if you want to take a look at where these reads are coming from or whether there's any way we could filter intervals of this type out of FindBreakpointEvidence go ahead and assign yourself.
The region consists of an Alu, followed by a poly-A, followed by a piece of a LINE (L1P3). It seems quite hopeless, so I'll try to figure out a way to exclude it.
The worst offender right now is interval 13913 in the current data set:
13913 Y:16691826-16692366 hdfs://svdev-1-m:8020/user/cwhelan/outs_tws_kill_promiscuous_kmers/NA12878_PCR-_30X/fastq/assembly13913.fastq
This is the one that takes a long time in SGA correction and filtering, only to have SGA filter all of the reads and therefore blow up.
Is suspicious in that it's on the Y for NA12878 (there are quite a few Y intervals actually).
@tedsharpe if you want to take a look at where these reads are coming from or whether there's any way we could filter intervals of this type out of FindBreakpointEvidence go ahead and assign yourself.