Closed LeonHafner closed 6 months ago
I'm still wondering why we suddenly only produce a single output file as input for collect_tfs.nf
and not one for every pair of L1, L10, P6 and P13.
I'm curious if you have any insights on that @nictru?
If not, I’ll start looking into it for debugging.
The purpose of collect_tfs.nf
is to create a single, non-pairing-specific list of all transcription factors which need to be included in the final report. Based on this list, data about the transcription factors can then be collected and prepared for the report. We might not even need this any more since fetching of additional data will mainly be handled dynamically by the report.
In earlier versions, we would fetch chip-atlas bed files, binding motif logos etc. based on this list.
Changes look good so far
First line of the input files for collect_tfs.nf looks like that:
sum mean q95 q99 median p-value rank dcg
.
The line was treated like a regular TF line. By skipping it we prevent "sum" from getting included into the TF list.
I'm still wondering why we suddenly only produce a single output file as input for
collect_tfs.nf
and not one for every pair of L1, L10, P6 and P13. I'm curious if you have any insights on that @nictru? If not, I’ll start looking into it for debugging.
This was due to a mismatch of condition labels ("P6" vs. "p6") and a subsequent failed matching. Solved by adjusting the pipeline input file "bam_design2.tsv" for ChromHMM. No further changes necessary.
Current bugs:
L1:L10_enhancers.ranking.tsv
. Since the single file is not stored as a list in nextflow, therankings.name.join()
operation does not work and produces an empty string.enhancers_P6_TF_Gene_Affinities.txt
), which append three columns NumPeaks, AvgPeakDistance and AvgPeakSize to the matrix of gene-TF affinities.