Open wigge206 opened 4 years ago
Your commands look good and that is what I would expect to happen -- shorter reads can support a longer isoform. The idea is that if the truncated read still covers a part of the isoform that is unique to that isoform, then it can be considered a supporting read for that isoform.
The way we have implemented this is only a simple check of mapping quality -- if the mapq from minimap2 is 0, then read mapped ambiguously to multiple isoforms and is not counted. If a read aligns to an isoform with mapq of 1 or greater, then that read is assigned to that isoform. 1 is not a conservative threshold -- I would recommend that you either try a higher --quality
threshold (you could try 10, or 30) or try running with collapse with --stringent
. Stringent enforces that the supporting reads need to cover 80% of an isoform's nucleotides, including some bases of the first and last exon.
Let me know if you have further questions and thanks for your patience.
-Alison
I had tried --stringent
with little change, increasing the quality threshold did changes the number of reads mapped etc but not as expected. The FL known isoform (dark blue) lost the majority of its reads, whilst the novel isoform (orange track in IGV, green in FLAIR isoform.png), which, contains all the same exon boundaries as the FL with exception of the inclusion of the entire intron 4, retained its reads.
The concern is that when assessing the alignments the vast majority of reads do not contain bases that overlap the intron inclusion (the only unique region compared to FL). The vast majority of reads mapped to the novel isoform would map equally well to the FL isoform.
Hi Alison, I thought I would add more detail as I have attempted a few more things but still are having issues. Firstly, the sequencing was preformed on two PCR amplicons. One amplicon is generated by primers in the first and last exon and represents the FL (and FL splice isoforms). The second amplicon is generated by a primer in a known internal start site and the same reverse primer. These are well described splicing changes and the delta isoform is represented gencode and the gtf file I have been using.
The issue is the reads coming from the delta isoform are being used as evidence for a FL isoform with an intron inclusion event rather then being mapped to the known delta isoforms. I have tried to split these reads out by extracting any read that was entirely contained between the two delta primers (with a small buffer). I was able to split the reads (not perfectly), however, after running flair on these reads they were now mapped to a different FL isoform despite very few reads being mapped to the first three exon. There was no evidence of the delta isoforms.
Is this an issue of flair? Whereby, internal start codons are miscalled/mapped to longer isoforms?
I am wanting to visualise reads that map to each isoform by using the --generate_map option from collapse. The reads the supposedly support this novel isoform (225532d7-ec74-4839-9708-8c26168366cb_ENSG00000141510.17, bottom panel in image) only have partly mapped to the isoform, is this to be expected? This isoform was highly expressed and so we are wanting to design assay to validate this isoform, hence the need to visualse reads.
In case I have made an error in my code:
python flair align -g hg38.fa -r sample1.fastq -v1.3 -o sample1
Repeat for all samplespython flair correct -q sample1.bed -g hg38.fa -f gencode.v29.annotation.gtf -o sample1
Repeat for all samplescat *all_corrected.bed > all_samples_corrected.bed
python flair collapse -q all_samples_corrected.bed -r sample1.fastq sample2.fastq sample3.fastq -f gencode.v29.annotation.gtf --generate_map
To pull out reads and visualise in IGV I matched the reads IDs to the original alignments files for each sample.
grep 225532d7-ec74-4839-9708-8c26168366cb_ENSG00000141510.17 allsamples_aligned.isoform.read.map.txt |sed 's/,/\n/g' | sed '1d' > 225532d7-ec74-4839-9708-8c26168366cb_ENSG00000141510.17.reads.map.txt
samtools view sample1.bam | grep -f 225532d7-ec74-4839-9708-8c26168366cb_ENSG00000141510.17.reads.map.txt > subset.sam
samtools view -H sample1.bam > header
cat header subset.sam > subset_sample1.sam
samtools view -bS subset_sample1.sam > subset_sample1.bam
samtools sort subset_sample1.bam -o subsetSort_sample1.bam
samtools index subsetSort_sample1.bam