TCB-yehong / PhosPiR

An automatic pipeline to analyze phosphoproteomics
Other
7 stars 0 forks source link

PhosPir and TMT experiments #2

Closed hcadre closed 2 years ago

hcadre commented 2 years ago

Hi, I am wondering whether PhosPir could also be adapted to accept data from TMT-labeled samples? I guess it boils down to question whether PhosPir will be able to handle the "Reporter_intensity_corrected" columns from the corresponding Phospho (STY)site.txt file as input, right? Best, Hannes

TCB-yehong commented 2 years ago

Hi, yes it could take in TMT-labeled samples. There is a "Other" option, if direct "MaxQuant" standard result file is not available. You could format the data in the following way:

  1. File format should be .csv or .xlsx.

  2. The first 6 columns should contain information as follows. Column 1 contains the UniProt protein ID, column 2 contains the protein description, column 3 displays gene name, column 4 shows the phosphorylated amino acid residue, column 5 records the phosphorylation site position within the protein, and column 6 shows the sequence window. The sequence window should have the phosphorylation site in the center position and extend in both directions by at least 7 amino acids. The order of the columns is crucial and should not be randomized.

  3. Column 7 onward should contain sample intensity values. Each column corresponds to 1 sample, and each row corresponds to 1 phosphorylation pattern. Intensity values should not be in log scale.

  4. Missing values should be written as “NA” or 0. Intensity columns should be numeric columns.

Choose "Other" for input option, then choose the formatted file as input for PhosPiR should work.

Or if you like, you could send the first few rows of the Phospho (STY)site.txt file, I can check the formatting to see if it could be input directly to PhosPiR with "MaxQuant" option.

hcadre commented 2 years ago

Requirements as described in 3. for sample intensity values are probably difficult to fulfill with the original MaxQuant output since each sample in Phospho (STY).site.txt is represented by 3 columns (Reporter intensity corrected 11 | Reporter intensity corrected 12 | Reporter intensity corrected 13 | Reporter intensity corrected 21 | Reporter intensity corrected 2___2 ...) for singly, doubly and multiply phosphorylated sites but I can try to use the "expand site table"-function in Perseus squeeze each sample into one column

hcadre commented 2 years ago

Enclosed are the first few lines of an appropriately formated file TMT_format_testing.xlsx

TCB-yehong commented 2 years ago

Hi, PhosPiR should be able to take your Phospho (STY)site.txt file directly. It combine these columns usually from the original Phospho (STY)site.txt file. However if you want to analyze them separately, there is a function in PhosPiR (only for "MaxQuant" input files), called "Expanded phosphorylation count", that works exactly like "expand site table" function from Perseus, where it generates the formatted input for PhosPiR for separated phosphorylated site count. Yes the format you included is perfect for selecting "Other" as input option! One other thing I wanted to mention is that the imputation step only works when there are at least 4 non-missing values per row (to make sure it's sensible). For the attached example file, imputation couldn't work (if the option is selected, it will automatically redirect you to select another option).

hcadre commented 2 years ago

Imputation is not really necessary for many TMT-labeled experiments since the number of missing values is much much lower, compared to label-free measurements. So I skipped the lines containing only "0"s neglecting these p-sites and tried to run the resulting file through PhosPiR. Following grouping of the samples I got up to the step where the Overview Figures should be generated but received an error here: ... image

TCB-yehong commented 2 years ago

Hi, Did you run PhosPiR from the beginning? This short video includes an example of group setup: https://youtu.be/c7n7yE0DMsA. If you are looking for the specific group setup line in run.R, it's this one: source(paste0(codepth,"inputGeneration.R")), which also includes all other setups. You could also provide the groups with an excel file like in the following example: Phospho (STY)Sites_sample_grouping_order_input_example.csv To provide the excel, you would need to go through the setup line. Let me know if something's unclear.

hcadre commented 2 years ago

I made a mistake during the grouping of my samples - your comments pointed me to the right path! Thank you very much! Now the annotation step is running.....