davidaknowles / leafcutter

Annotation-free quantification of RNA splicing. Yang I. Li, David A. Knowles, Jack Humphrey, Alvaro N. Barbeira, Scott P. Dickinson, Hae Kyung Im, Jonathan K. Pritchard
http://davidaknowles.github.io/leafcutter/
Apache License 2.0
207 stars 115 forks source link

PSI phenotype from STAR output #107

Closed m-waqas closed 5 years ago

m-waqas commented 5 years ago

I have performed read mapping to the genome using STAR on with a 2-pass mapping to allow more splice junction reads to map to novel junctions. The parameters of the mapping were set as the following: --sjdbOverhang 100 --outSAMprimaryFlag AllBestScore : output all alignments with the best score as primary alignments --outFilterMismatchNmax 2/0(first/second pass): alignment will be output only if it has fewer mismatches than this value --outSJfilterCountTotalMin 10 5 5 5 (non-canonical SJ and 3 canonical SJs) --outSAMstrandField intronMotif --outFilterIntronMotifs RemoveNoncanonical : filter out alignments with non-canonical junctions --alignIntronMin 20 : mininum intron size --alignIntronMax 6000 : maximum intron size --outSAMtype BAM SortedByCoordinate: output sorted BAM file It gave me sorted bam output alongwith junction files, Can I use these junction files to generate phenotype PSI matrix to perform sQTL analysis?

sample junction file is attached here: 108.zip

davidaknowles commented 5 years ago

These have a different format to the junctions files we use. Our junc files are like bed files, having columns (chrom, start, end, name[unused], count, strand). The STAR manual (http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STAR.posix/doc/STARmanual.pdf) documents their format. You could write a script to convert from one to the other since all the info is there, but extracting the junction counts from the bam is pretty fast anyway.