epigen-UCSD / atac_seq_pipeline

BSD 3-Clause "New" or "Revised" License
6 stars 0 forks source link

TSS computation: potential overinflation #16

Open opoirion opened 3 years ago

opoirion commented 3 years ago

test dataset:

This is a dummy test dataset from a small subsamples of reads from one of our project

BAM and BED files from:

http://ns104190.ip-147-135-44.us:8088/dataset_report?dataset_name=LOCAL_TEST&output_folder_name=output_TEST&token=8c35c77aeece00af32a5b5b96fee4db6

(name sorted bam file)

(bed file with dupplicates removed)

(bam file transformed into bed file)

Script:

TSS script from frank refactored: https://gitlab.com/Grouumf/snATAC/-/blob/master/bin/compute_TSS_epigenomic_center runned with python3. metaseq lib used: https://github.com/opoirion/metaseq

TSS script using DeepTools from BAM file: https://gitlab.com/Grouumf/snATAC/-/blob/master/bin/bam_to_TSS.bash

TSS script from ATACtools pipeline: https://gitlab.com/Grouumf/ATACdemultiplex/-/blob/master/ATACCellTSS/ATACCellTSS.go from the pipeline https://gitlab.com/Grouumf/ATACdemultiplex/-/tree/master/ATACCellTSS

TSS computation with frank's pipeline give erroneous results with peaks up to 27 fold (TSS = 27)

both bedtools and custom pipeliine give

image

deeptools pipeline:

image

custom pipeline (from bed file unshifted) image

I suspect metaseq library or boundary effect to cause the overinflation

biomystery commented 3 years ago

Hi Olivier,

I think the plots you are comparing have different scales:

  1. The tss plot in my code are scaled to the boundary bins so that the boundary has a value close to 1
  2. The tss plots in your code and deeptools are at other scales that without this re-scale as the tss plot in my code

This explained the difference in scales: Mine (0-25) and yours (0 - .3)/