Closed bzip2 closed 1 year ago
Hi,
Thank you for reaching out with the comments and suggestions and sorry for the delay in responding! I'd recommend you try running the demo (in the wiki) since I think it might answer a lot of your questions.
I've updated the readme to reflect the steps in the current version of the pipeline (we're in the process of doing revisions and some things have changed, hopefully we should have the final version of the paper available in the near future). The current order of steps is:
griffin_filter_sites
is no longer part of the pipeline and I removed it from the readme.
The mappable regions file used for "--mappable_regions_path" in the griffin_GC_and_mappability_correction step is specified in the config: mappable_regions: ../../Ref/k100_minus_exclusion_lists.mappable_regions.hg38.bed This contains each position with a mappability score of 1 that doesn't overlap centromeres, gaps, fix patches, alternative haplotypes, and excluded regions. We took out the repeat masker filter because we found that we could get rid of the regions that were causing problems with the above filters rather than needing to exclude all repeats. I've removed it from the readme.
And we are working on a WDL pipeline as an alternative to the snakemake but I think it's going to be a while before that is available.
Let me know if you have any further questions.
Hi.
It would be great to apply your methods and code to new data, but I'm finding it very difficult. I've looked through all the scripts (yaml, Python and snakefile, though not the notebooks) in this repo and in
Griffin_analyses
.griffin_GC_correction
task and its correspondinggriffin_GC_correction.snakemake
, but that doesn't seem to exist. There'sgriffin_GC_and_mappability_correction
, and by looking through the individual files, you can eventually learn (fromGriffin/snakemakes/griffin_GC_and_mappability_correction/config/config.yaml
) that mappability correction is disabled by default because it's not recommended.griffin_GC_counts.py
explains "--mappable_regions_path" as "highly mappable regions to be used in GC correction". Do I want all mappable regions, regions with score >= 0.95, or score = 1?k100_minus_exclusion_lists.mappable_regions.hg38.bed
) , so I can't tell if this is all mappable regions or a subset. The regions span 2.5 Gb, which sounds too high for highly mappable regions.griffin_filter_sites
task/pipeline, and the readme mentionsgriffin_filter_sites.snakefile
. I can't find anything resembling that in either repo. Did I miss it, or is it not available?griffin_filter_sites
include files for sites with high and low map(p)ability. I can't find any reference to these files in file names or scripts in either repo (with or without the double p), so I can't follow how this step is connected to anything else.repeat_masker...
, but I don't see any mention of that in file names or scripts, even if I try "repeat" and "mask" individually. I even searched forhgTable
.If you'd like others to use your code (and cite your paper), and it makes things easier, I'd suggest dropping snakemake.