Scavetta / ComPrAn

R package for SILAC complexomics
Other
0 stars 1 forks source link

package workflow - new functions #12

Open Petra-P opened 4 years ago

Petra-P commented 4 years ago

Package workflow at the moment is:

##Use example peptide data set, read in and clean data
inputFile <- system.file("extdata", "data.txt", package = "ComPrAn")
peptides <- cleanData(data.table::fread(inputFile), fCol = "Search ID")
## separate chemical modifications and labelling into separate columns
peptides <- splitModLab(peptides) 
## remove unneccessary columns, simplify rows
peptides <- simplifyProteins(peptides) 

Old code, does not work any more

peptide_index <- makeEnv(peptides)

for (i in names(peptide_index)) {

assign(i, pickPeptide(peptide_index[[i]]), envir = peptide_index)

}

Comparison of function runtime 
![compareFunctions](https://user-images.githubusercontent.com/52913337/79777827-bd3fd480-832f-11ea-8cae-f0b598340fdd.png)

`allPeptidePlot()` _function was also changed to work with list instead of environment_

protein <- "P52815" max_frac <- 23

default plot

allPeptidesPlot(peptide_index,protein, max_frac = max_frac)

- ### Unchanged function
_Maybe we should rename this function as it uses list and not environment as an input?_

Create a list of proteins present in both/only in one label state

listOnlyOneLabState <- onlyInOneLabelState_ENV(peptide_index)

- ### New function to get normalised table
New function: `getNormTable()`
Functions that are now unexported and used internally in new function: `extractRepPeps()`,`normalizeTable()`,`normTableForExport()`,`normTableWideToLong()`.

New code:

extract table with normalised protein values for both scenarios

forExport <- getNormTable(peptide_index,purpose = "export") forAnalysis <- getNormTable(peptide_index,purpose = "analysis")

Old code:

normalize proteins

names(peptide_index) %>%

map_df(~ extractRepPeps(peptide_index[[.]], scenario = 'A', label = T)) %>%

normalizeTable() -> protNormLab

names(peptide_index) %>%

map_df(~ extractRepPeps(peptide_index[[.]], scenario = 'A', label = F)) %>%

normalizeTable() -> protNormUnlab

names(peptide_index) %>%

map_df(~ extractRepPeps(peptide_index[[.]], scenario = 'B')) %>%

normalizeTable() -> protNormComb

create table that is saved in tab delimited format

forExport <- normTableForExport(protNormLab, protNormUnlab, protNormComb)

create table that is used in further analysis

forAnalysis <- normTableWideToLong(protNormLab, protNormUnlab, protNormComb)

- ### Unchanged functions for plotting
Functions `proteinPlot()`, `groupHeatMap()`, `oneGroupTwoLabelsCoMigration()`,
`twoGroupsWithinLabelCoMigration()` were not changed.

- ### New function for clustering
New function: `clusterComp()`
Modified function: `assignClusters()`
Functions that are now unexported and used internally in new function: `makeDist()`
Function output is a named list

New code:

Create components necessary for clustering

clusteringDF <- clusterComp(forAnalysis,scenar = "A", PearsCor = "centered")

Create a data frames with cluster assignment

labTab_clust <- assignClusters(.listDf = clusteringDF,sample = "labeled", method = 'complete', cutoff = 0.5) unlabTab_clust <- assignClusters(.listDf = clusteringDF,sample = "unlabeled", method = 'complete', cutoff = 0.5)

Old code:

Extract data frames for clustering:

forAnalysis %>%

as_tibble() %>%

filter(scenario == "A") %>%

select(-scenario) %>%

mutate(Precursor Area = replace_na(Precursor Area, 0)) %>%

spread(Fraction, Precursor Area) -> forClustering

forClustering[is.na(forClustering)] <- 0

forAnalysis[forAnalysis$scenario == "A",] %>%

select(-scenario) %>%

spread(Fraction, Precursor Area) -> forClustering

forClustering[is.na(forClustering)] <- 0

labelledTable <- forClustering[forClustering$isLabel==TRUE,]

unlabelledTable <- forClustering[forClustering$isLabel==FALSE,]

Create distance matrix

labDist <- makeDist(t(select(labelledTable,-c(1:3))), centered = T)

unlabDist <- makeDist(t(select(unlabelledTable,-c(1:3))), centered = T)

Assign clusters to data frames

labelledTable_clust <- assignClusters(labelledTable, labDist,

method = 'average', cutoff = 0.85)

unlabelledTable_clust <- assignClusters(unlabelledTable,unlabDist ,

method = 'average', cutoff = 0.85)

- ### Unchanged functions for plotting and exporting cluster assignment

makeBarPlotClusterSummary(labTab_clust, name = 'labeled') makeBarPlotClusterSummary(unlabTab_clust, name = 'unlabeled') tableForClusterExport <- exportClusterAssignments(labTab_clust,unlabTab_clust)

Petra-P commented 4 years ago

In the initial issue there is a function missing in the first part, it should be: