SimonSchlumbohm / HarmonizR

11 stars 3 forks source link

How to create a batch lay out description csv file for a maxquant output? #9

Closed mahadabihie closed 6 months ago

mahadabihie commented 7 months ago

I am analyzing proteomics data of 3 different patients (with 3 of the same conditions used on each) and I am concerned about potential batch effects. Clustering analysis has shown that the 9 samples line up with the samples of the same patient origin instead of the same condition. The patients are V3, V4 and V5. The conditions are 01, 02, and 12. The names of the 9 samples are V3-01, V3-02, V3-12, V4-01, V4-02, V4-12, V5-01, V5-02, V5-12.

Through the use of iMetalab 2.3.0, which utilized maxquant, I got the output titled "proteinGroups.txt" and "proteinGroups-report.txt". I assumed that these files could work for the raw unadjusted matrix required for HarmonizR.

This was the code I used to create the csv file.

## **installation**
devtools::install_github("SimonSchlumbohm/HarmonizR/HarmonizR")
library("HarmonizR")

#tsv
raw.tsv = read.delim("/Users/m-bih/OneDrive/Desktop/R files/metapro/data/proteinGroups.txt")

#create batch layout description
raw.csv <- as.data.frame(c(
  "Peptides.V3_01,1,1", "Peptides.V3_02,1,1", "Peptides.V3_12,1,1",
  "Peptides.V4_01,1,2", "Peptides.V4_02,1,2", "Peptides.V4_12,1,2",
  "Peptides.V5_01,1,3", "Peptides.V5_02,1,3", "Peptides.V5_12,1,3")) 

#rename colunm name
colnames(raw.csv) <- "ID.sample.batch"

#create csv file
write.csv(raw.csv, "/Users/m-bih/OneDrive/Desktop/R files/metapro/data/raw.csv")

#run harmonizR
harmonizR("/Users/m-bih/OneDrive/Desktop/R files/metapro/data/proteinGroups.txt", 
          "/Users/m-bih/OneDrive/Desktop/R files/metapro/data/raw.csv")

proteinGroups.txt proteinGroups_report.txt

Is it possible to confirm how the csv file would need to be formatted in this case?

SimonSchlumbohm commented 7 months ago

Hello @mahadabihie

for an extensive guide on input formatting, please refer to the provided SOP, which exemplary shows how the input has to be formatted:

https://github.com/SimonSchlumbohm/HarmonizR/blob/main/HarmonizR_SOP.pdf

In general, the actual data only needs to comprise of a feature name column and a sample name row plus the actual data points, of course. The description comprises of three columns, being sample ID, sample number and batch affiliation number, i.e., 1, ..., n. Samples have to be sorted the same in both files. Again, for a detailed description see the provided SOP. If you use Perseus, a plugin for that is also available. (It has not been updated continously, however, so it might be slightly outdated.)