Open iskandr opened 2 years ago
@iskandr I am not sure what the problems it is now. If you have a smaller dataset with the gene fusion you discussed, I would like to test to see why.
Hello,
Here is a quick example. The following link contains two files https://www.dropbox.com/sh/77ui9a3m5yrdte6/AACQFRukk_-9fBUIw1dRimpya?dl=0. They both regard a (rather absurd) 5000x pacbio coverage simulation of a single fusion transcript (no background transcripts). The simulated fusion transcript that we are trying to find is HPS1:WHAMMP3 and it appears to have this inverted property that is described in the original post.
The first file is a fastq.gz file containing these simulated reads of this single fusion transcript. The second file is a bam file that results after aligning with minimap2 splice
against the hg38 genome, followed by sorting with samtools sort -n
. Running this file through LongGF using the following command fails to find the HPS1:WHAMMP3 fusion:
LongGF \
test10_sorted.bam \
Homo_sapiens.GRCh38.105.gtf \
40 50 100 > test10_sorted.log
Let me know if you can replicate the issue and let us know if LongGF is designed to locate fusions like this.
Hi,
We're working on benchmarking LongGF on simulated long read data generated using badread. There are a minority of fusions which appear to not get detected where the fusion partners originate on opposite strands of DNA. The mutation in these cases I guess would be fusion with an inverted sequence, can you LongGF correctly call these kinds of compound mutations?