RegulatoryGenomicsGroup / chicdiff

A differential caller for capture Hi-C data
4 stars 3 forks source link

Which sample is used as a 'treatment'/'control' in ChicDiff pipeline? #8

Closed AdrijaK closed 4 years ago

AdrijaK commented 4 years ago

Hi all,

thanks for this useful package.

What determines which of the samples will be used as a reference in DESeq2 part? Are the output log2 fold changes read as CD4 vs Mono (using Mono as reference factor) or the other way around?

From vignette:

countData <- list(
    CD4 = c(NCD4_22 = file.path(testDataPath_CD4, "unitTest_CD41.chinput"),
            NCD4_23 = file.path(testDataPath_CD4, "unitTest_CD42.chinput")
            ),

    Mono = c(Mon_2 = file.path(testDataPath_Mono, "unitTest_Mono2.chinput"),
            Mon_3 = file.path(testDataPath_Mono, "unitTest_Mono3.chinput")
            )
  )
worchard commented 4 years ago

Hi Adrija!

The order here is determined by the R 'factor' function which by default will order the levels alphabetically; so the reference condition is the first alphabetically. So in the case of CD4 and Mono, the log-folds are Mono vs CD4. This can be checked explicitly by setting the 'saveAuxFiles = TRUE' in settings at the start and taking a look at the DESeq object. Many won't want to produce all of the auxiliary files, so I agree what is the reference condition should be made clear somewhere in the final Chicdiff output Rds alone, so thank you for pointing this out!

AdrijaK commented 4 years ago

Thanks for the clarification, I managed to find the necessary DESeq object file and it all makes sense now.