Closed m-waqas closed 5 years ago
These have a different format to the junctions files we use. Our junc files are like bed files, having columns (chrom, start, end, name[unused], count, strand). The STAR manual (http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STAR.posix/doc/STARmanual.pdf) documents their format. You could write a script to convert from one to the other since all the info is there, but extracting the junction counts from the bam is pretty fast anyway.
I have performed read mapping to the genome using STAR on with a 2-pass mapping to allow more splice junction reads to map to novel junctions. The parameters of the mapping were set as the following: --sjdbOverhang 100 --outSAMprimaryFlag AllBestScore : output all alignments with the best score as primary alignments --outFilterMismatchNmax 2/0(first/second pass): alignment will be output only if it has fewer mismatches than this value --outSJfilterCountTotalMin 10 5 5 5 (non-canonical SJ and 3 canonical SJs) --outSAMstrandField intronMotif --outFilterIntronMotifs RemoveNoncanonical : filter out alignments with non-canonical junctions --alignIntronMin 20 : mininum intron size --alignIntronMax 6000 : maximum intron size --outSAMtype BAM SortedByCoordinate: output sorted BAM file It gave me sorted bam output alongwith junction files, Can I use these junction files to generate phenotype PSI matrix to perform sQTL analysis?
sample junction file is attached here: 108.zip