This is a question and not necessarily an issue.I have encountered some problems based on my lack of understanding whilst using your package.I have zealously followed the instructions in the vignette and it seems as if some of the functions are set to expect specific column names with inflexibility. I have an output file from a LFQ MaxQuant analysis and some of the column names you mention do not exist and this has disallowed me from creating a summarized experiment object.

My question is, how can I create a summarized experiment object to allow me to continue with my analysis ?

Below is a documentation of some errors that I have encountered.

dataset <- read_csv(file) Parsed with column specification: cols( `Protein IDs Majority protein IDs Peptide counts (all) Peptide counts (razor+unique) Peptide counts (unique) Fasta headers Number of proteins Peptides Razor + unique peptides Unique peptides Peptides 6 Peptides 62 Peptides 63 Peptides 8 Peptides 82 Peptides 83 Peptides 9 Peptides 92 Peptides 93 Peptides G Peptides G2 Peptides G3 Razor + unique peptides 6 Razor + unique peptides 62 Razor + unique peptides 63 Razor + unique peptides 8 Razor + unique peptides 82 Razor + unique peptides 83 Razor + unique peptides 9 Razor + unique peptides 92 Razor + unique peptides 93 Razor + unique peptides G Razor + unique peptides G2 Razor + unique peptides G3 Unique peptides 6 Unique peptides 62 Unique peptides 63 Unique peptides 8 Unique peptides 82 Unique peptides 83 Unique peptides 9 Unique peptides 92 Unique peptides 93 Unique peptides G Unique peptides G2 Unique peptides G3 Sequence coverage [%] Unique + razor sequence coverage [%] Unique sequence coverage [%] Mol. weight [kDa] Sequence length S... )

dataset<-filter(dataset,Potential.contaminant != "+") Error in filter_impl(.data, quo) : Evaluation error: object 'Potential.contaminant' not found.

dataset$Gene.names %>% duplicated() %>% any() [1] FALSE Warning message: Unknown or uninitialised column: 'Gene.names'.

data_unique <- make_unique(data, "Gene.names", "Protein.IDs", delim = ";") Error: proteins is not a data frame

data_se <- make_se(dataset, LFQ_columns, experimental_design) Error: 'name' and/or 'ID' columns are not present in 'dataset'. Run make_unique() to obtain the required columns

data_unique <- make_unique(dataset,"Protein.IDs", delim = ";") Error in eval(assertion, env) : argument "ids" is missing, with no default

data_se_parsed <- make_se_parse(dataset, LFQ_columns) Error: 'name' and/or 'ID' columns are not present in 'dataset'. Run make_unique() to obtain the required columns

sessionInfo() R version 3.4.1 (2017-06-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] DEP_1.0.1 bindrcpp_0.2 readr_1.1.1 dplyr_0.7.4
[5] rpart.plot_2.1.2 rpart_4.1-11 SummarizedExperiment_1.8.0 DelayedArray_0.4.1
[9] matrixStats_0.52.2 Biobase_2.38.0 GenomicRanges_1.30.0 GenomeInfoDb_1.14.0
[13] IRanges_2.12.0 S4Vectors_0.16.0 BiocGenerics_0.24.0

loaded via [1] nlme_3.1-131 [6] MSnbase_2.4.0 [11] tmvtnorm_1.4-10 [16] compiler_3.4.1 [21] mvtnorm_1.0-6 [26] foreign_0.8-69 [31] htmlwidgets_0.9 [36] BiocInstaller_1.28.0 [41] BiocParallel_1.12.0 [46] Matrix_1.2-10 [51] stringi_1.1.6 [56] shinydashboard_0.6.1 [61] hms_0.4.0 [66] reshape2_1.4.2 [71] data.table_1.10.4-3 [76] purrr_0.2.4 [81] mime_0.5 [86] iterators_1.0.8 proteinGroups.txt

As this is a question instead of an issue, could you please post it at the Bioconductor support forum: https://support.bioconductor.org. In this way, more users will benefit!

You have to specify some parameters to your specific data. For example, the 'Reverse' column is completely empty so filtering on this column doesn't make sense. Additionally, there is no 'Gene.names' column, so you should specify another column in the _makeunique function. See code below to obtain a SummarizedExperiment object.

An additional warning: your data only consists of 98 quantified proteins and most of these proteins are only quantified in a single sample. After filtering out proteins with too many missing values, the dataset only consists of tens of proteins. This is too few to perform reliable normalization and FDR calculations.

Load data

proteins <- read.csv("proteinGroups.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE) expdesign <- read.csv("ExpDesign.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE)

Filter contaminants and make unique names

proteins <- filter(proteins, Potential.contaminant != "+") proteins_unique <- make_unique(proteins, "Majority.protein.IDs", "Protein.IDs", delim = ";")

Make SummarizedExperiment

columns <- grep("LFQ.", colnames(proteins_unique)) se <- make_se(proteins_unique, columns, expdesign) `

arnesmits / DEP

Arguments Missing #1

Load data

Filter contaminants and make unique names

Make SummarizedExperiment