chung-lab / SCAFE

Single Cell Analysis of Five'Ends
MIT License
44 stars 11 forks source link

About scafe.workflow.cm.aggregate #18

Closed jiawei-zhong closed 2 years ago

jiawei-zhong commented 2 years ago

Hi,

  1. When I use "scafe.workflow.cm.aggregate" of new version (v1.0.0), I don't know the format of lib_list_path. Like this?

    SRR12443300<\t>SRR12443300.collapse.ctss.bed.gz<\t>SRR12443300.unencoded_G.collapse.ctss.bed.gz
    SRR12443301<\t>SRR12443301.collapse.ctss.bed.gz<\t>SRR12443301.unencoded_G.collapse.ctss.bed.gz
    SRR12443302<\t>SRR12443302.collapse.ctss.bed.gz<\t>SRR12443302.unencoded_G.collapse.ctss.bed.gz
  2. Which file should I use in the result of scafe.workflow.cm.aggregate? I can't find any file that link each cCREs ID of each library together.

  3. Another problem in scafe.tool.cm.prep_genome. The downloaded reference contains glm directory but self-generating reference doesn't have it. Cound you please tell me how to generate the glm model?

Thanks!

chung-lab commented 2 years ago

Thanks for using SCAFE.

  1. Yes, but better use the full path of the ctss files to make it independent from CWD. You can check the demo data input if in doubt
  2. There will NOT be a file linking cCREs ID to library. The output of scafe.workflow.cm.aggregate is library agnostic, it only output the CRE definition and you have to run scafe.tool.sc.count using the aggregated CREs to generate count matrix per library.
  3. Yes, self-generated reference will not have the glm directory. You can either copy the glm directory from the downloaded reference or you run scafe.tool.cm.filter with an epigenome track e.g. ATAC matched to you input 5' data.

Feel free to drop us a message if you need help or have suggestions.

jiawei-zhong commented 2 years ago

Hi,

There is clearer after download demo data. So it seems that you use scafe.workflow.cm.aggregate to generate consensus/high-confident regions that share in all libraries. Did you consider the libraries that are from two groups (like control vs. disease)? Cause I think using consensus/high-confident regions would sacrifice some difference between groups.

Another thing: the scafe.tool.sc.count demo code lacks --reference.

Thanks!

chung-lab commented 2 years ago

Thanks Jiawei for the suggestions! Yes, scafe.workflow.cm.aggregate is to generate CRE by aggregating all libraries. For finding differences between groups of libraries, we would recommend defining a common set of CRE and use differential expression to identify differences, similar to gene expression which you will only use on set of reference across all samples. Also, corrected the scafe.tool.sc.count demo code!

Thanks!