dozmorovlab / TADCompare

Package for analysis and characterization of differential TADs
https://dozmorovlab.github.io/TADCompare/
Other
22 stars 2 forks source link

Comparing a list of contact matrices representing three different time points by `TimeCompare` function #18

Closed Gemma-Zhang-326 closed 11 months ago

Gemma-Zhang-326 commented 12 months ago

Hi there! I'm thrilled to use your tool for analyzing my data. I'm trying to do a time-course data analysis using the Timecompare function. Unfortunately, I only have three time points, and your documentation says that the function requires at least four time points. As a result, the output I'm getting is unusual. I did notice that the output of time_var$TAD_Bounds seems to occupy the boundary scores of samples themselves, but I'm not entirely sure if I understood your documentation correctly.

I have a few questions. Can I differentiate boundaries based on boundary scores by comparing only three samples manually? Also, could you provide some guidance on differentiation thresholds? For instance, I want to differentiate boundaries that appear or disappear specifically in the first time period, as well as boundaries that appear or disappear in the first two time periods. I would greatly appreciate your help!

mdozmorov commented 12 months ago

Hi @Gemma-Zhang-326, thank you for using it. Yes, TADcompare for timecourse needs at least 4 samples, to distinguish "early" and "late" changing boundaries. If you have an example code that could reproduce the issue you described, I'll look into it.

For 3 samples, I recommend calling boundaries in each sample using SpectralTAD and then do pairwise comparisons using findOverlaps. Potentially setting the maxgap parameter to the resolution of your data, to avoid boundaries being one bin apart being called as differential (if boundaries are adjacent, chances are it's a technical artifact and they may be the same boundary).

Gemma-Zhang-326 commented 12 months ago

Sorry, I don't understand your concern regarding the increase in the false positive rate during the comparison of 3 samples. I assumed that I could still perform pairwise comparisons using the TADCompare function to identify different TAD boundaries by finding overlaps. In the result of the execution of TimeCompare, why can't we just do a simple classification based on the boundary scores? FYI I ask in this way because of my limited coding skills.

mdozmorov commented 11 months ago

You can do the analysis using 3 samples as follows:

library(TADCompare)
data("time_mats")
time_mats[[4]] <- NULL # Remove 4th matrix
time_var <- TimeCompare(time_mats, resolution = 50000)
time_var$TAD_Bounds
   Coordinate   Sample 1    Sample 2   Sample 3 Consensus_Score               Category
1    17350000  3.8501876  2.47907505  3.1996876       3.1996876            Dynamic TAD
2    18800000  2.0859417 -0.03668098  7.2863302       2.0859417     Late Appearing TAD
3    18850000  0.6752193  6.73142477 -0.8310747       0.6752193            Dynamic TAD
4    20700000  1.6578767  3.27533627  3.0810224       3.0810224    Early Appearing TAD

Be careful with the "Category" column, its classifications are less accurate. I'd suggest visualizing and clustering the boundary scores, and define boundary behavior from these clusters. Like this:

library(pheatmap)
mtx_to_plot <- time_var$TAD_Bounds[, c("Sample 1", "Sample 2", "Sample 3")]
rownames(mtx_to_plot) <- time_var$TAD_Bounds$Coordinate
annotation_row <- data.frame(Category = time_var$TAD_Bounds$Category)
rownames(annotation_row) <- time_var$TAD_Bounds$Coordinate
p <- pheatmap(mtx_to_plot, cluster_cols = FALSE, scale = "row", annotation_row = annotation_row, cutree_rows = 6)
p.clust <- cbind(mtx_to_plot, cluster = cutree(p$tree_row, k = 6))
Gemma-Zhang-326 commented 11 months ago

You can do the analysis using 3 samples as follows:

library(TADCompare)
data("time_mats")
time_mats[[4]] <- NULL # Remove 4th matrix
time_var <- TimeCompare(time_mats, resolution = 50000)
time_var$TAD_Bounds
   Coordinate   Sample 1    Sample 2   Sample 3 Consensus_Score               Category
1    17350000  3.8501876  2.47907505  3.1996876       3.1996876            Dynamic TAD
2    18800000  2.0859417 -0.03668098  7.2863302       2.0859417     Late Appearing TAD
3    18850000  0.6752193  6.73142477 -0.8310747       0.6752193            Dynamic TAD
4    20700000  1.6578767  3.27533627  3.0810224       3.0810224    Early Appearing TAD

Be careful with the "Category" column, its classifications are less accurate. I'd suggest visualizing and clustering the boundary scores, and define boundary behavior from these clusters. Like this:

library(pheatmap)
mtx_to_plot <- time_var$TAD_Bounds[, c("Sample 1", "Sample 2", "Sample 3")]
rownames(mtx_to_plot) <- time_var$TAD_Bounds$Coordinate
annotation_row <- data.frame(Category = time_var$TAD_Bounds$Category)
rownames(annotation_row) <- time_var$TAD_Bounds$Coordinate
p <- pheatmap(mtx_to_plot, cluster_cols = FALSE, scale = "row", annotation_row = annotation_row, cutree_rows = 6)
p.clust <- cbind(mtx_to_plot, cluster = cutree(p$tree_row, k = 6))

Thank you for your prompt response and kind assistance! Your help was greatly appreciated and it made a big difference!I will continue using this tool for my studies. Thank you for your effort. Have a great day!