antigenomics / tcr-pmhc-study

Mining TCR:pMHC structural data
5 stars 1 forks source link

Something strange in the Rmd file when FRs are the input file #6

Open vadimnazarov opened 7 years ago

vadimnazarov commented 7 years ago

Lines 114 - 130 outputs something strange when the input data contains both CDRs and FRs. I feel like there are some problems with merging, but I'm not sure what's going in this lines of code. It seems like multiplication of marginal probabilities done incorrectly in case of FRs.

Lines:

get_prob <- function(var_name) {
  .df <- as.data.frame(fit[[var_name]]$prob)
  colnames(.df) <- gsub("Var1", "contact", colnames(.df))
  colnames(.df) <- gsub("Freq", paste("Freq", var_name, sep="."), colnames(.df))
  .df
}

prob.tmp <- get_prob("contact")

for (var in colnames(df)[!(colnames(df) %in% c("contact", "pdb_id"))]) {
  prob.tmp <- merge(prob.tmp, get_prob(var))
}

prob.tmp$contact <- as.logical(prob.tmp$contact)

prob.tmp$P <- apply(prob.tmp[,which(grepl("Freq",colnames(prob.tmp)))], 1,
                function(x) prod(x))
mikessh commented 7 years ago

Lets just filter FR for now. FR really messes with everything, most notably with the CDR:antigen contact position matrix.

We also need to check for integrity: remember those "blank" amino acids in the output. Perhaps these blanks in FR result in column shift and mess. Some other checks: separator should be \t, fill=T