daisybio / TF-Prioritizer

Bioinformatics pipeline to identify differentially active transcription factors between conditions using expression and epigenetic data
GNU General Public License v3.0
13 stars 0 forks source link

Add optional signal input for gABC scores #64

Open nictru opened 10 months ago

nictru commented 10 months ago

STARE can handle activity measures in order to improve its affinity calculations. The type of measure is not really important. In order to make use of this, we can implement an optional input for condition-specific signal inputs and then query the signal values based on the STARE input regions.

Currently we can take every kind of bed file as input, since only the first 3 columns are relevant. In this sense, also narrowPeak and broadPeak files are bed-like files. We also could provide the user with the option to select one column of the input peak files as signal values. Problem with this approach is, that the signal is not continous and we potentially need signal values for regions with no matching peak. We could kind of take the average of neighbouring peaks in this case, but I do not think this makes sense biologically.

mlist commented 10 months ago

something to ask Marcel how they deal with this but if no peaks are there should we not set the contribution to zero?

nictru commented 10 months ago

So by design we currently have 3 ways of determining potentially active regions based on the input peak files:

  1. INSIDE: Take peaks as they are (ATAC-Seq mostly)
  2. BETWEEN: If there are two peaks with a max distance of k, use the region between them
  3. INCL_BETWEEN: Like 2, but also include the original peaks

By definition, in the second case we do not have any peaks in the investigated regions. Since this is the method we used for the HM-ChIP-Seq analyses, this is quite relevant.

PS: For INSPECT we also want to further shrink the investigated regions using various enhancer localization methods, such as eHMM, which would make even more local activity information required