daisybio / TF-Prioritizer

Bioinformatics pipeline to identify differentially active transcription factors between conditions using expression and epigenetic data
GNU General Public License v3.0
13 stars 0 forks source link

collect_tfs.nf bug fixes #70

Closed LeonHafner closed 6 months ago

LeonHafner commented 6 months ago

Current bugs:

LeonHafner commented 6 months ago

I'm still wondering why we suddenly only produce a single output file as input for collect_tfs.nf and not one for every pair of L1, L10, P6 and P13. I'm curious if you have any insights on that @nictru? If not, I’ll start looking into it for debugging.

nictru commented 6 months ago

The purpose of collect_tfs.nf is to create a single, non-pairing-specific list of all transcription factors which need to be included in the final report. Based on this list, data about the transcription factors can then be collected and prepared for the report. We might not even need this any more since fetching of additional data will mainly be handled dynamically by the report.

In earlier versions, we would fetch chip-atlas bed files, binding motif logos etc. based on this list.

nictru commented 6 months ago

Changes look good so far

LeonHafner commented 6 months ago

First line of the input files for collect_tfs.nf looks like that: sum mean q95 q99 median p-value rank dcg. The line was treated like a regular TF line. By skipping it we prevent "sum" from getting included into the TF list.

LeonHafner commented 6 months ago

I'm still wondering why we suddenly only produce a single output file as input for collect_tfs.nf and not one for every pair of L1, L10, P6 and P13. I'm curious if you have any insights on that @nictru? If not, I’ll start looking into it for debugging.

This was due to a mismatch of condition labels ("P6" vs. "p6") and a subsequent failed matching. Solved by adjusting the pipeline input file "bam_design2.tsv" for ChromHMM. No further changes necessary.