JunyueC / sci-RNA-seq3_pipeline

Processing pipeline scripts for sci-RNA-seq3 (bash, R, python)
37 stars 15 forks source link

What is the difference between rmdup_sam and rmdup_sam2 files? #3

Open chatterjee89 opened 3 years ago

chatterjee89 commented 3 years ago

Hi Dr. Cao,

I had some questions regarding the duplicate removal steps. With the following code, I understand that we are removing duplicates based on identical UMI, tagmentation site, reverse transcription index and ligation adaptor index...

`echo "Start removing duplicates..." echo input folder: $filtered_sam_folder echo output folder: $rmdup_sam_folder mkdir -p $rmdup_sam_folder module unload python

bash_script=$script_path/sci3_rmdup_nomismatch.sh Rscript $R_script $bash_script $filtered_sam_folder $sample_ID $rmdup_sam_folder $core $mismatch

echo echo "Start removing duplicates..." echo input folder: $all_output_folder/rmdup_sam echo output folder: $all_output_folder/rmdup_sam_2 mkdir -p $all_output_folder/rmdup_sam_2 module unload python

bash_script=$script_path/sci3_rmdup_nomismatch.sh filtered_sam_folder=$all_output_folder/rmdup_sam rmdup_sam_folder=$all_output_folder/rmdup_sam_2 Rscript $R_script $bash_script $filtered_sam_folder $sample_ID $rmdup_sam_folder $core $mismatch`

My questions are: _1. What exactly is the difference between the sam files in rmdup_sam and rmdup_sam_2 folders?

  1. Why are two rounds of the same script being run post barcode attachment?
  2. The previous step is barcode+UMI attachment but the way I read it, we are only attaching the barcode, does that sound right?_ I don't understand where the UMI is being defined. Am I missing something?

I am trying to understand which step is doing exactly what within sci3_main.sh and hence the questions. Thank you!

Regards, Deep