Consultation on snATAC-seq Data Analysis for differential peak analysis

RuiyuRayWang commented 7 months ago

TL;DR In doing differential peak analysis,

what's the difference between union_bed_file and rnd_bed_file
which to use for pp.make_peak_matrix

Original post

Hi, @beyondpie. Thanks for making this remarkable pipeline publicly available.

I'd like to work with processed files from catlas.org, subset out my cell types of interest, create a peak matrix and perform differential peak analysis.

I downloaded files with the _rm_dlt.h5ad suffix under snapatac2_anndata_per_sample_with_fragment_dir, and used the following code to get peak matrix:

peak_mat = sa2.pp.make_peak_matrix( adata = snap2, inplace = False, file = outdir / "sa2pmat.union_pmat.h5ad", backend = "hdf5", peak_file = union_bed_file, chunk_size = 10000, use_x = False ) (ref. sa2pmay.py lines 66-72)

where

union_bed_file = 'mba.whole.sa2.final.peak.srt.bed'

Then do differential peak analysis with tl.marker_regions or tl.diff_test.

however I'm unsure if this is correct as I noticed two peak files: union_bed_file and rnd_bed_file, where

rnd_bed_file = 'mba.whole.shuffle.removeovlp.bed'

What's the purpose of having both bed files and which one should I use for diff-peak analysis?

Thanks! Ray

beyondpie commented 7 months ago

Hi Ray,

rnd_bed_file is used for our analysis, such as Cicero, as a random control. So you can ignore it.
union_bed_file should have all the peaks we identified (merged from all the cell types).
I also shared the pmat, gmat, bmat information here:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE246791
pp.make_peak_matrix is used to create pmat, i.e, single-cell level counts on peak regions. Since you are working on differential peak analysis, please use union_bed_file as they are the real peaks.

Thanks! Songpeng

RuiyuRayWang commented 7 months ago

Thank you Songpeng! I'll work with union_bed_file and see how it goes.

RuiyuRayWang commented 7 months ago

Hi @beyondpie ,

Following your advice, I used union_bed_file to obtain pmat and successfully performed differential peak analysis. But here's a follow-up question:

I noticed that the peaks in the union_bed_file were given by a fixed width of 500bp. Consequently, peaks in the differential analysis were also 500bp in width. Is this 500bp peak width defined within the peak calling (macs2) parameters? If I want to obtain a more refined peak (cCRE) boundaries, how can I go about doing this?

I'm considering plotting reads coverage grouped by cell types to visualize the peaks more intuitively, but it doesn't seem like snapatac2 currently provide that utility. Do you happen to know of any other way to accomplish this?

Thanks!

Best regards, Ray

beyondpie commented 7 months ago

@RuiyuRayWang

Is this 500bp peak width defined within the peak calling (macs2) parameters?

No. After peak calling, we extended the peak summit to 500 (or 499) bps around the peak summit. (https://github.com/beyondpie/CEMBA_wmb_snATAC/blob/b85d01f20ee0c252fd2b1785f581d24618e961bd/03.peakcalling/src/main/R/iterativeMergePeak.R#L76)

I'm considering plotting reads coverage grouped by cell types to visualize the peaks more intuitively, but it doesn't seem like snapatac2 currently provide that utility. Do you happen to know of any other way to accomplish this?

Do you mean you want to use IGV to visualize the tracks (as bigwig format? If so, you can try this function: https://kzhang.org/SnapATAC2/api/_autosummary/snapatac2.ex.export_coverage.html)

Or in our pipeline, during peak calling, we save the bed graph files generated by macs2, and then convert them into bigwig files, and then use IGV to visualize the tracks. But this needs you to call peaks by yourself, so maybe not a good way.

BTW, in CATLas, we have uploaded bigwig tracks on subclass level: http://catlas.org/wholemousebrain/#!/igv, hope this works on your side. (Feel free to download the bigwig files we generated on subclass level.)

Thanks! Songpeng

RuiyuRayWang commented 6 months ago

Do you mean you want to use IGV to visualize the tracks (as bigwig format? If so, you can try this function: https://kzhang.org/SnapATAC2/api/_autosummary/snapatac2.ex.export_coverage.html)

BTW, in CATLas, we have uploaded bigwig tracks on subclass level: http://catlas.org/wholemousebrain/#!/igv, hope this works on your side. (Feel free to download the bigwig files we generated on subclass level.)

Thank you Songpeng! I think I'll try out these suggestions and see how it goes.

Best, Ray

beyondpie commented 6 months ago

@RuiyuRayWang Should I close this issue? Songpeng

RuiyuRayWang commented 5 months ago

@RuiyuRayWang Should I close this issue? Songpeng

Yes please

beyondpie / CEMBA_wmb_snATAC

Consultation on snATAC-seq Data Analysis for differential peak analysis #12