Closed RuiyuRayWang closed 5 months ago
Hi Ray,
rnd_bed_file
is used for our analysis, such as Cicero, as a random control. So you can ignore it. union_bed_file
should have all the peaks we identified (merged from all the cell types).pp.make_peak_matrix
is used to create pmat, i.e, single-cell level counts on peak regions. Since you are working on differential peak analysis, please use union_bed_file
as they are the real peaks.Thanks! Songpeng
Thank you Songpeng! I'll work with union_bed_file
and see how it goes.
Hi @beyondpie ,
Following your advice, I used union_bed_file
to obtain pmat and successfully performed differential peak analysis. But here's a follow-up question:
I noticed that the peaks in the union_bed_file
were given by a fixed width of 500bp. Consequently, peaks in the differential analysis were also 500bp in width. Is this 500bp peak width defined within the peak calling (macs2) parameters?
If I want to obtain a more refined peak (cCRE) boundaries, how can I go about doing this?
I'm considering plotting reads coverage grouped by cell types to visualize the peaks more intuitively, but it doesn't seem like snapatac2 currently provide that utility. Do you happen to know of any other way to accomplish this?
Thanks!
Best regards, Ray
@RuiyuRayWang
Is this 500bp peak width defined within the peak calling (macs2) parameters?
No. After peak calling, we extended the peak summit to 500 (or 499) bps around the peak summit. (https://github.com/beyondpie/CEMBA_wmb_snATAC/blob/b85d01f20ee0c252fd2b1785f581d24618e961bd/03.peakcalling/src/main/R/iterativeMergePeak.R#L76)
I'm considering plotting reads coverage grouped by cell types to visualize the peaks more intuitively, but it doesn't seem like snapatac2 currently provide that utility. Do you happen to know of any other way to accomplish this?
Do you mean you want to use IGV to visualize the tracks (as bigwig format? If so, you can try this function: https://kzhang.org/SnapATAC2/api/_autosummary/snapatac2.ex.export_coverage.html)
Or in our pipeline, during peak calling, we save the bed graph files generated by macs2, and then convert them into bigwig files, and then use IGV to visualize the tracks. But this needs you to call peaks by yourself, so maybe not a good way.
BTW, in CATLas, we have uploaded bigwig tracks on subclass level: http://catlas.org/wholemousebrain/#!/igv, hope this works on your side. (Feel free to download the bigwig files we generated on subclass level.)
Thanks! Songpeng
Do you mean you want to use IGV to visualize the tracks (as bigwig format? If so, you can try this function: https://kzhang.org/SnapATAC2/api/_autosummary/snapatac2.ex.export_coverage.html)
BTW, in CATLas, we have uploaded bigwig tracks on subclass level: http://catlas.org/wholemousebrain/#!/igv, hope this works on your side. (Feel free to download the bigwig files we generated on subclass level.)
Thank you Songpeng! I think I'll try out these suggestions and see how it goes.
Best, Ray
@RuiyuRayWang Should I close this issue? Songpeng
@RuiyuRayWang Should I close this issue? Songpeng
Yes please
TL;DR In doing differential peak analysis,
union_bed_file
andrnd_bed_file
pp.make_peak_matrix
Original post
Hi, @beyondpie. Thanks for making this remarkable pipeline publicly available.
I'd like to work with processed files from catlas.org, subset out my cell types of interest, create a peak matrix and perform differential peak analysis.
I downloaded files with the
_rm_dlt.h5ad
suffix undersnapatac2_anndata_per_sample_with_fragment_dir
, and used the following code to get peak matrix:where
Then do differential peak analysis with
tl.marker_regions
ortl.diff_test
.however I'm unsure if this is correct as I noticed two peak files:
union_bed_file
andrnd_bed_file
, whereWhat's the purpose of having both bed files and which one should I use for diff-peak analysis?
Thanks! Ray