collect/summarize data from multiple loci

mbstadler commented 2 months ago

Maybe it would be useful to create a function to collects reads from multiple loci (anchor regions of the same length, e.g. TF binding sites, TSS, repeat elements, etc.) and combines the reads from these loci into a unified coordinate system (relative position within anchor region).

The resulting object would have relative-positions in anchor region as rows, and samples as columns. The per-sample NaArray would have (an increased number of) individual read columns. Their names may have to be made unique (one read may overlap several anchor regions).

Maybe the function would also provide summarized information (average over reads by anchor region), so that the NaArray would have the number of columns corresponding to the number of anchor regions.

The colData of the summarized object may retain information about which reads originate from which anchor region (list-like column similar to read-level qc data).

csoneson commented 2 months ago

First draft (still with some questions/TODOs) pushed to the use_NAmatrix branch.

csoneson commented 1 month ago

Done in https://github.com/fmicompbio/footprintR/commit/ba7ef8eb9c311adab91d0ea8dafb71b37f62da8f (et al)

fmicompbio / footprintR

collect/summarize data from multiple loci #8