adoebley / Griffin

A flexible framework for nucleosome profiling of cell-free DNA
Other
24 stars 16 forks source link

Nucleosome profiling output files and directories are inconsistent with the README's expected results #13

Closed alanasweinstein closed 1 year ago

alanasweinstein commented 1 year ago

Hi @adoebley ,

This is an excellent tool; thank you for such an important contribution to the community! I have been running the pipeline and producing results, but I have a few questions about the expected output of the nucleosome profiling step and would appreciate your advice.

The README.md file located at https://github.com/adoebley/Griffin#readme gives the following in the "Description" -> 3. griffin_nucleosome_profiling -> "Outputs" section:

  • Outputs:
    1. results/coverage/all_sites/.all_sites.coverage.txt.
      • nucleosome profiles and metadata for each site list.
      • Both GC corrected and non-GC corrected profiles are in this file and must be separated for downstream analysis (GC_correction column). Coverage profile data is labeled with the start coordinate of the bin. For instance, the column labeled -15 contains the coverage information for -15bp to 0bp relative to the site location.
    2. results/coverage//..coverage.txt
      • These folders contain intermediate files with the coverage profiles for individual site lists. These have been concatenated into results/coverage/all_site/.all_sites.coverage.txt.

However, when I run this step on both the tutorial data and my own data, I get a different output directory structure and different/missing files, as follows:

I looked over the nucleosome profiling .snakefile and it seemed consistent with my output, not the output described in the README, as far as I could tell (but I'm admittedly new to snakemake).

Given all this:

  1. What is the expected behavior of the tool regarding these nucleosome profiling outputs?
  2. Is the output I get essentially equivalent to what's described in the README, just grouped by samples instead of by site lists? Do I actually already have versions of all the files described in the README, just under different names?
  3. "Concatenation" is mentioned in the README -- is there supposed to be a file generated that concatenates the core features (central coverage, etc.) of all site lists for all samples into one file? Or another similar top-level concatenation of all results?
  4. Depending on the answers to the above, what is the preferred solution for generating the missing files, if there are any?

Apologies for the long question. Thanks again for this great tool and for your help.

Best, Alana Weinstein

adoebley commented 1 year ago

Hi Alana,

Thanks for pointing this out! I've updated the readme to contain the correct output descriptions for the current version. It sounds like you are getting the expected outputs. Let me know if you have any other questions!

Thanks, Anna-Lisa