ECoffeyLab / PhosPiR

PhosPiR is an automatic pipeline to analyze phosphoproteomics data. It is fully tested on Windows. To initiate PhosPiR pipeline, drag the run.R file to an R window.
1 stars 0 forks source link

unable to make PhosPiR work #1

Open nanderthol opened 2 years ago

nanderthol commented 2 years ago

Hi,

I was eager to run this and see what it produces. I do lots of quantitative phosphoproteomics and have always been dissatisfied with the s/w for analysis. I was hopeful that this might do a bunch of things that I do manually and more. I've been able to run the quantitative comparisons. I've been able to run the clustering and PCA. I've never seen any string analysis, enrichment analysis, or kinase substrate analysis come out of it. When I try to run all steps it puts out a 3D PCA plot and then stops with a warning to the effect of "savewidget requires pandoc".

Any suggestions? I'm not using maxquant. I have 12 plex data from TMT with the designated 6 columns of annotations. I was also having problems with missing values even though it says it completed the imputation step. I'm currently using data with no missing values to see if I can get it to work before circling back to that issue.

Thanks,

Noah Dephoure, PhD Weill Cornell Medical College New York, NY

TCB-yehong commented 2 years ago

Hi, are you perhaps running the program in R studio? The program is intended to run in R (and not R studio) on Windows, could you check if that's the cause? This is a link to a setup demonstration video: https://youtu.be/c7n7yE0DMsA, I hope it will be of help.

nanderthol commented 2 years ago

Hi,

Thanks for responding.

I’ve tried it in many different versions of R. I get the best results using the R gui. Does it matter if I run the 32 bit or 64 bit version? Is it important to stick with the exact version of R? I’m not at my computer but I think that’s 4.01 or 4.03. I’ve tried both ad well as later versions.

Is it possible to run the scripts one at a time? I’ve tried selecting the different options for which portions of analysis to run as well. I can only ever get the statistical comparisons to run when I select that option only. I’m most interested in getting the enrichment, kinase-substrate, and interaction network analysis but haven’t figured out how to make that work.

Best,

Noah

Sent from my iPhone

On Jun 1, 2022, at 3:35 AM, TCB-yehong @.***> wrote:



Hi, are you perhaps running the program in R studio? The program is intended to run in R (and not R studio) on Windows, could you check if that's the cause? This is a link to a setup demonstration video: https://youtu.be/c7n7yE0DMsAhttps://urldefense.proofpoint.com/v2/url?u=https-3A__youtu.be_c7n7yE0DMsA&d=DwMCaQ&c=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s&r=SwBHM17UWpvdb-Gfw9mZdBNC67-fzy3lrmBc-EMVeRo&m=cG7KQkxzZMnCUW5iWtOo4mYwc_Nc1xZ6AzCpj75p0-xV7zjferdraQaNH-ZQC4zD&s=SaYhfJwSmaWPs1T9pAo2CdGEZZouQ8nmBDq5iQ1hZ6E&e=, I hope it will be of help.

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ECoffeyLab_PhosPiR_issues_1-23issuecomment-2D1143222539&d=DwMCaQ&c=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s&r=SwBHM17UWpvdb-Gfw9mZdBNC67-fzy3lrmBc-EMVeRo&m=cG7KQkxzZMnCUW5iWtOo4mYwc_Nc1xZ6AzCpj75p0-xV7zjferdraQaNH-ZQC4zD&s=wT2STzhvpFoK9riJCQgWbsjelFzS4LdhC3y79_hRGDw&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AF2YRL2ZHOICPGYVEN6R73LVM4HFTANCNFSM5XO7JTAQ&d=DwMCaQ&c=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s&r=SwBHM17UWpvdb-Gfw9mZdBNC67-fzy3lrmBc-EMVeRo&m=cG7KQkxzZMnCUW5iWtOo4mYwc_Nc1xZ6AzCpj75p0-xV7zjferdraQaNH-ZQC4zD&s=3YWdAl0FJHpMGZs_zEIpllFp4X96ZdPYQBAMsg38MlA&e=. You are receiving this because you authored the thread.Message ID: @.***>

nanderthol commented 2 years ago

FYI - I’ve also watched the videos. I think that’s the only place that describes the non maxquant file formats.

Noah

Sent from my iPhone

On Jun 1, 2022, at 3:35 AM, TCB-yehong @.***> wrote:



Hi, are you perhaps running the program in R studio? The program is intended to run in R (and not R studio) on Windows, could you check if that's the cause? This is a link to a setup demonstration video: https://youtu.be/c7n7yE0DMsAhttps://urldefense.proofpoint.com/v2/url?u=https-3A__youtu.be_c7n7yE0DMsA&d=DwMCaQ&c=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s&r=SwBHM17UWpvdb-Gfw9mZdBNC67-fzy3lrmBc-EMVeRo&m=cG7KQkxzZMnCUW5iWtOo4mYwc_Nc1xZ6AzCpj75p0-xV7zjferdraQaNH-ZQC4zD&s=SaYhfJwSmaWPs1T9pAo2CdGEZZouQ8nmBDq5iQ1hZ6E&e=, I hope it will be of help.

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ECoffeyLab_PhosPiR_issues_1-23issuecomment-2D1143222539&d=DwMCaQ&c=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s&r=SwBHM17UWpvdb-Gfw9mZdBNC67-fzy3lrmBc-EMVeRo&m=cG7KQkxzZMnCUW5iWtOo4mYwc_Nc1xZ6AzCpj75p0-xV7zjferdraQaNH-ZQC4zD&s=wT2STzhvpFoK9riJCQgWbsjelFzS4LdhC3y79_hRGDw&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AF2YRL2ZHOICPGYVEN6R73LVM4HFTANCNFSM5XO7JTAQ&d=DwMCaQ&c=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s&r=SwBHM17UWpvdb-Gfw9mZdBNC67-fzy3lrmBc-EMVeRo&m=cG7KQkxzZMnCUW5iWtOo4mYwc_Nc1xZ6AzCpj75p0-xV7zjferdraQaNH-ZQC4zD&s=3YWdAl0FJHpMGZs_zEIpllFp4X96ZdPYQBAMsg38MlA&e=. You are receiving this because you authored the thread.Message ID: @.***>

ECoffeyLab commented 2 years ago

Dear Noah Thank you for getting in touch. I am forwarding this to the pipeline developer Ye Hong, who can help you to troubleshoot. Kind regards Eleanor

On 1. Jun 2022, at 11.28, nanderthol @.***> wrote:



Hi,

Thanks for responding.

I’ve tried it in many different versions of R. I get the best results using the R gui. Does it matter if I run the 32 bit or 64 bit version? Is it important to stick with the exact version of R? I’m not at my computer but I think that’s 4.01 or 4.03. I’ve tried both ad well as later versions.

Is it possible to run the scripts one at a time? I’ve tried selecting the different options for which portions of analysis to run as well. I can only ever get the statistical comparisons to run when I select that option only. I’m most interested in getting the enrichment, kinase-substrate, and interaction network analysis but haven’t figured out how to make that work.

Best,

Noah

Sent from my iPhone

On Jun 1, 2022, at 3:35 AM, TCB-yehong @.***> wrote:



Hi, are you perhaps running the program in R studio? The program is intended to run in R (and not R studio) on Windows, could you check if that's the cause? This is a link to a setup demonstration video: https://youtu.be/c7n7yE0DMsAhttps://urldefense.proofpoint.com/v2/url?u=https-3A__youtu.be_c7n7yE0DMsA&d=DwMCaQ&c=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s&r=SwBHM17UWpvdb-Gfw9mZdBNC67-fzy3lrmBc-EMVeRo&m=cG7KQkxzZMnCUW5iWtOo4mYwc_Nc1xZ6AzCpj75p0-xV7zjferdraQaNH-ZQC4zD&s=SaYhfJwSmaWPs1T9pAo2CdGEZZouQ8nmBDq5iQ1hZ6E&e=, I hope it will be of help.

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ECoffeyLab_PhosPiR_issues_1-23issuecomment-2D1143222539&d=DwMCaQ&c=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s&r=SwBHM17UWpvdb-Gfw9mZdBNC67-fzy3lrmBc-EMVeRo&m=cG7KQkxzZMnCUW5iWtOo4mYwc_Nc1xZ6AzCpj75p0-xV7zjferdraQaNH-ZQC4zD&s=wT2STzhvpFoK9riJCQgWbsjelFzS4LdhC3y79_hRGDw&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AF2YRL2ZHOICPGYVEN6R73LVM4HFTANCNFSM5XO7JTAQ&d=DwMCaQ&c=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s&r=SwBHM17UWpvdb-Gfw9mZdBNC67-fzy3lrmBc-EMVeRo&m=cG7KQkxzZMnCUW5iWtOo4mYwc_Nc1xZ6AzCpj75p0-xV7zjferdraQaNH-ZQC4zD&s=3YWdAl0FJHpMGZs_zEIpllFp4X96ZdPYQBAMsg38MlA&e=. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/ECoffeyLab/PhosPiR/issues/1#issuecomment-1143421594, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AY3GO23WDPDWDN3JLTUN7BTVM43KXANCNFSM5XO7JTAQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

TCB-yehong commented 2 years ago

Hi, thanks for checking our materials. I think the best bit would depend on the OS but usually 64 bit version is recommended. The code is written with 4.0.2, but any later version should work! Yes it is possible to run one step at a time. After you've encountered an error, you could open run.R file, and copy the lines after differential expression to the R window to run the rest of the functions. You should have my email address now, if you like, you can send the data to me and I can run it for you. If you are worried about the security of the data, you can randomize the 6 annotation columns so the real identity of the proteins are known to you alone.

nanderthol commented 2 years ago

Yehong and Eleanor,

Thank you for the help. I think I’ve got it working, but it hasn’t completed yet. Not sure why it stops when I try to run all steps, but I was able to move forward by pasting the remaining steps from run.R as you suggested. It’s now made it through all the annotation steps but took overnight as it said it might. Will it take this long every time I run it on a big dataset? Is it possible to pull all that information for the full set of human uniprot entries (or even just the reviewed ones) and maintain them locally?

I think it may have stopped again on the enrichment steps. The R window is non-responsive (but the system is not reporting that it’s “not responding”), the CPU usage is 0%, and the memory usage is 2.3 GB. The last messages on the gui are that it downloaded protein.aliases and protein.info http://protein.info/ from stringdb. There’s a folder for KEGG enrichment\Network, but it’s empty.

I’d like to be able to do additional analysis, so I think it’s worth it to keep working on getting it running. If continue to struggle, I may take you up on your offer to run it for me.

Thanks again.

ps - I don’t see any email addresses that I can reply to other than the anonymized @. @.>.com

Noah

Noah Dephoure, Ph.D. Assistant Professor of Research Director, Advanced Biomolecular Analysis Core Sandra and Edward Meyer Cancer Center Department of Biochemistry Weill Cornell Medical College 413 East 69th Street, BRB1612 New York, NY 10021 Telephone: 646-962-6232 E-mail: @.***

On Jun 1, 2022, at 8:09 AM, TCB-yehong @.***> wrote:

Hi, thanks for checking our materials. I think the best bit would depend on the OS but usually 64 bit version is recommended. The code is written with 4.0.2, but any later version should work! Yes it is possible to run one step at a time. After you've encountered an error, you could open run.R file, and copy the lines after differential expression to the R window to run the rest of the functions. You should have my email address now, if you like, you can send the data to me and I can run it for you. If you are worried about the security of the data, you can randomize the 6 annotation columns so the real identity of the proteins are known to you alone.

— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ECoffeyLab_PhosPiR_issues_1-23issuecomment-2D1143524672&d=DwMCaQ&c=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s&r=SwBHM17UWpvdb-Gfw9mZdBNC67-fzy3lrmBc-EMVeRo&m=30fTDcZp2c94i3JCrUaKUz6msf6W8F7KtD7oekxyqsG6XQc0HA8qxydVDGBfBFnm&s=b_yqos3eiq2yVTIDmn5PtxlaMoSR26QRWDwRAD0mAA0&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AF2YRL4OWFDCEQ35LDDHSQLVM5HH7ANCNFSM5XO7JTAQ&d=DwMCaQ&c=lb62iw4YL4RFalcE2hQUQealT9-RXrryqt9KZX2qu2s&r=SwBHM17UWpvdb-Gfw9mZdBNC67-fzy3lrmBc-EMVeRo&m=30fTDcZp2c94i3JCrUaKUz6msf6W8F7KtD7oekxyqsG6XQc0HA8qxydVDGBfBFnm&s=ROivqmS6dqQzurparO8RnPy6ZtAYm4sPOYBUtBsonRE&e=. You are receiving this because you authored the thread.

TCB-yehong commented 2 years ago

Hi, yes I'm afraid it's going to take a long time every single time to extract the annotations. Yes after extracting the annotations, they will be in your local folder. If you would like to keep a copy of all human UniProt reviewed entries and their annotation on your local drive, I have downloaded a list for you from UniProt which includes all their reviewed human accessions, totaling 20386 entries (Please see attached). If you put this in the pipeline input format and run the pipeline setup, then annotation step, information for all of them should be extracted. This will take a very long time though, I have never tried to run so many at once, maybe you can separate them into several runs. You can skip the UniProt information extraction step for all future runs after this. For the network step, please make sure the internet connection is good, and the PC has sufficient memory. I will send my email over to you, we can schedule a zoom meeting if you like, to further discuss the issues in detail! And yes, please let me know anytime if you would like me to run it.

uniprot-filtered-organism_Homo+sapiens+(Human)+[9606]+AND+review--.xlsx