erhard-lab / price

Improved Ribo-seq enables identification of cryptic translation events
10 stars 0 forks source link

Resolving multi-mapping reads #3

Closed TamaraO closed 6 years ago

TamaraO commented 6 years ago

When are the multi-mapping reads resolved with the RESCUE algorithm? If I run Price using a bam file generated with STAR, will it resolve the multi-mapping reads?

Here are my STAR parameters:

STAR --readFilesCommand zcat \
--genomeDir /references/b37/STAR \
--runThreadN 2 --alignEndsType EndToEnd \
--outFilterType BySJout --outSAMtype BAM SortedByCoordinate \
--alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 \
--outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 \
--alignMatesGapMax 1000000 \
--readFilesIn $FASTA_IN \
--twopassMode Basic \
--limitBAMsortRAM 30000000000
florianerhard commented 6 years ago

Dear Tamara,

I am afraid that this is not that simple. If you want to use it without using our pipeline, you have to make sure that the read names that go into the bam file are integers:

gedi -e FastqFilter -i raw.fastq > reads.fastq

The raw.fastq may also be gzipped or bzipped. Then map reads.fastq with STAR. After that:

gedi -e Bam2CIT -id mapped.cit mapped.bam
gedi -e ResolveAmbiguities -r mapped.cit -g <genomic-index> -s rescued.out -o rescued.cit

The first line creates mapped.cit, the second rescue.out (statistics) and rescued.cit (the mappings to be used in PRICE). is the name of the indexed genome and used to define contexts for each read (based on the transcripts).

Best, Florian