fgcz / prolfqua

Differential Expression Analysis tool box R lang package for omics data
https://pubs.acs.org/doi/pdf/10.1021/acs.jproteome.2c00441
MIT License
37 stars 7 forks source link

FragPipe v.19.1 combined_protein output not working with prolfqua::tidy_FragPipe_combined_protein #57

Closed mart5176 closed 1 year ago

mart5176 commented 1 year ago

Hello and thank you for the open access package. Quick question about a problem I'm running into.

Describe the bug FragPipe v.19.1 combined_protein.tsv output gives the error when prolfqua::tidy_FragPipe_combined_protein is called:

Warning: argument is not being debuggedError in 1:which(cnam == "Combined Total Spectral Count") : 
  argument of length 0

To Reproduce

library(tibble)
library(dplyr)
library(prolfqua)

datadir<- file.path(find.package("prolfquadata"),"quantdata")

#read input annotation
annotation <- readxl::read_xlsx("myannotationfile.xlsx")
#read input protein
protein <- tibble::as_tibble(read.csv("~combined_protein.tsv", header=TRUE, sep="\t", stringsAsFactors = FALSE))

undebug(prolfqua::tidy_FragPipe_combined_protein)
protein <- prolfqua::tidy_FragPipe_combined_protein(protein)
protein <- protein |> dplyr::filter(unique.stripped.peptides > 1)
merged <- dplyr::inner_join(annotation, protein)

Additional context FragPipe v19.1 only includes one type of Intensity per experimental replicate and LFQ; could this be the reason for the error? I am including my combined TSV file for your reference.

Thanks again for any help. combined_protein.tsv.txt

wolski commented 1 year ago

"FragPipe v19.1 only includes one type of Intensity per experimental replicate and LFQ; could this be the reason for the error? I include my combined TSV file for your reference."

Yes, indeed, this is the reason for the problem. I got the "tidy_FragPipe_combined_protein" function working with the FP 19.1 output format. You will need to reinstall the prolfqua package. Then you can read the file:

res <- tidy_FragPipe_combined_protein("combined_protein.tsv",
                               intnames = "Razor Intensity",
                               maxlfqnames = "MaxLFQ Razor Intensity")

You will find the abundances stored in the "RMAHA1 Intensity" columns than in the "razor.intensity" column.

If you want to read the data first into a data frame and then pass them to the function, then read it by passing the check.names = FALSE options to read.csv.

tmp <- read.csv("combined_protein.tsv",
         header = TRUE,
         sep = "\t",
         stringsAsFactors = FALSE,
         check.names = FALSE)