FlorianPichot / RiboMethSeqPipeline

You will find here the different RiboMethSeq pipelines used by the NGS Core Facility "EpiRNA-Seq" from Nancy, France (https://umsibslor.univ-lorraine.fr/en/facility/epitranscriptomics-sequencing-epirna-seq).
1 stars 0 forks source link

Sam from the Fastq files during bowtie alignment #2

Open ishaaq34 opened 1 year ago

ishaaq34 commented 1 year ago

Hi , I have different datasets . I am bit confused why you have not specified the bam and sam files. Instead, you have given them temp file names

Screen Shot 2023-01-30 at 6 44 47 PM

Could you please make me understand the pipeline

FlorianPichot commented 1 year ago

Hi,

The SAM file is specified in bowtie2 with "-S tmp.sam". These BAM and SAM files are quickly removed at lines 24 and 25, so we don't give them a name since they are not used for a long period of time. rm tmp.sam rm temp.bam

We only use them to sort the reads and remove the unaligned ones, which can simply be done by using the "--no-unal" bowtie2 option and "samtools sort" on the same line with something like this (not tested) : (bowtie2 --no-unal --no-1mm-upfront -D 15 -R 2 -N 0 -L 10 -i S,1,1.15 -p 16 -x $BTPath -U $FN --un-gz Non_rRNA_$folder.fastq.gz | samtools sort -o mapped_RNA.bam - )2>nohup_bowtieL10EtoE_rRNA_$folder.out

In fact, "mapped_RNA" SAM and BAM files are considered as the true output of the bowtie2 alignment, while the temporary files are here to help us for debugging.

Did I answer correctly to your question?

ishaaq34 commented 1 year ago

Thank you so much. That was helpful. I am almost done with analysis. I could find the score A , B and C . My command line experience is not so good and I am struggling to make a script for ScoreMean and ScoreMax .Can you please guide me how can I get the scoreMean and ScoreMax . I would appreciate that much

FlorianPichot commented 1 year ago

Both scores can be find in the R script. Here is the code for scoreMean (results$mean) in the R script [Standard_RiboMethSeq_script_5prime3prime_Score2.R] : image. You can find scoreMax (results$max) at the same location in [Standard_RiboMethSeq_script_5primeOnly_score6.R]

FlorianPichot commented 1 year ago

Actually, there is a light mistake here on scoreMean calculations. ScoreMean is theoretically equal to "cumul" variable. Thus has been changed on our most recent pipelines, but not on this old one. Nonetheless, the differences between the variables seem negligeable, but I rather point this out to optimise your pipeline.

ishaaq34 commented 1 year ago

Thank you for your response. Hope you had a great weekend

Actually , I am working on worms . Not many sites are known so far in worms. Moreover, I am unable to comprehend the structure of the Hsapiens_rRNA.csv . I thought of ignoring the role of Hsapiens_rRNA.csv in the script (by#) but that did not work. Would you suggest anything that would work for me.

Thanks and best wishes, Raja

FlorianPichot commented 1 year ago

Hsapiens_rRNA.csv is a csv file with three columns containing all referenced 2'-O-Met sites, such as : image But in case of C. elegans, I don't have a position list with known 2'-O-Met sites. In that case, I would suggest to keep all positions as putative 2'-O-methylated sites if they check out all those following requirements :

Once you got the list of putative 2'-O-Met sites, then you can produce graphs/tables out of it for your data interpretation. In this case, these R scripts sadly can't help you further since these scripts are designed for predictions performance and not for analysis of putative sites.

ishaaq34 commented 1 year ago

Thank you so much that was helpful When I completed my analysis, I found some of the samples have lower methylation sites as compared to others . When I looked back at their sequencing counts, I found that the samples with less sequencing counts have less methylation sites. Seems lower sequencing depth in these samples is responsible for lower methylation sites . I was wondering what should be ideal sequencing counts that should be considered analysis for methylation site . Your thoughts on this?