TCB-yehong / PhosPiR

An automatic pipeline to analyze phosphoproteomics
Other
7 stars 0 forks source link

problem with the run.R file #3

Open junyang1966 opened 1 year ago

junyang1966 commented 1 year ago

Hi,

Thanks for developing the software PhosPiR. I really would like to try it on our phosphoproteomic data. I don' have much experience in programming as well as R. I watched the two YouTube videos. So I downloaded and installed R 4.0.3 using the link in the Readme file. I downloaded the PhosPiR pipeline exactly as described in the Readme file. I put the Run file in the R window. I did all these in my PC computer. However, I ran into a series of problem not mentioned in the YouTube video. Please see the attached file of what was shown in my R window. I am not sure whether it is a big or small problem and how to solve it.

Another thing is that I have multiple phosphorylated sites on single peptides. By reading the thread of one issue, I think the problem was addressed. However, due to my limited knowledge in bioinformatics and computer language, I don't understand the solution. At this point, I would like to know how to prepare the input file for multiple phosphorylated sites and what the format should be.

Many thanks in advance. Jun GraphAppPrintJob.pdf

TCB-yehong commented 1 year ago

Hi Jun, Thank you for sharing the error messages! The error seems to be involving R package version issues. I have modified the pipeline so it's performing correctly in the latest R (at this time), which is version 4.2.2. I have updated the description to say 4.2.2 instead of 4.0.3. Could you please install version 4.2.2 and please feel free to delete 4.0.3.

Yes the input format you have might be the major problem, I think I have received an email from you, which I have replied just now, but I will also include the suggestions here: Regarding your question, yes I'm afraid there will have to be reformatting of the input data. Different spectral analysis software could output the data in different ways, that might be why you have the different data format, with multiple phosphorylation sites marked on the same row. To change from this format to the PhosPiR preferred input format, there's quite a bit of work to be done. First, for each site, you have to check if it exists in multiple rows (sometimes the spectral analysis software output the same peptide on separate rows when the peptide has different number/position of phosphorylations). If it does, the intensity of this site should be combined. Then, for each site, you have to center it, which means, you would need 15 sequences before and after this site, or at least 7 sequences before and after this site. The final input would have the following format: each row would have 1 site centered, with 15 sequences (or at least 7 sequences) before and after it. The sample columns for this row would display the intensity values of this site, after the intensity have been combined from different rows that includes this particular site (if only one row includes this site, then this row's intensity can be used). For the rows with multiple sites in your data, this should be done for every site, and each site should take a new row. For the rows with a single site, the same thing should be done too, which means the site should be centered, and it should be checked with other rows to see if the same site is included elsewhere. After you have the new rows, the old rows can be removed. This might take a lot of time to do, one thing that might make this faster/more automated, is that if you have the raw data, you can run it through MaxQuant software, and the output will have the format that I described, and can be input directly to PhosPiR. Please let us know if you have further questions about how to proceed!

The input format will have to be changed in order for PhosPiR to not give an error for kinase analysis. However, if you would just like to analyze the intensities for the phosphopeptides without kinase analysis, you could exclude phosphosite identity entirely and treat the dataset like a proteomics dataset, that way you could still analyze your data with PhosPiR, just without the kinase analysis. Hope that helps!

junyang1966 commented 1 year ago

Hi Ye, Thanks for your quick response. I will try to use MaxQuant to generate the input file first and then work on the R version problem. Jun

From: TCB-yehong @.> Date: Monday, January 16, 2023 at 7:04 AM To: TCB-yehong/PhosPiR @.> Cc: Jun Yang @.>, Author @.> Subject: Re: [TCB-yehong/PhosPiR] problem with the run.R file (Issue #3)

Hi Jun, Thank you for sharing the error messages! The error seems to be involving R package version issues. I have modified the pipeline so it's performing correctly in the latest R (at this time), which is version 4.2.2. I have updated the description to say 4.2.2 instead of 4.0.3. Could you please install version 4.2.2 and please feel free to delete 4.0.3.

Yes the input format you have might be the major problem, I think I have received an email from you, which I have replied just now, but I will also include the suggestions here: Regarding your question, yes I'm afraid there will have to be reformatting of the input data. Different spectral analysis software could output the data in different ways, that might be why you have the different data format, with multiple phosphorylation sites marked on the same row. To change from this format to the PhosPiR preferred input format, there's quite a bit of work to be done. First, for each site, you have to check if it exists in multiple rows (sometimes the spectral analysis software output the same peptide on separate rows when the peptide has different number/position of phosphorylations). If it does, the intensity of this site should be combined. Then, for each site, you have to center it, which means, you would need 15 sequences before and after this site, or at least 7 sequences before and after this site. The final input would have the following format: each row would have 1 site centered, with 15 sequences (or at least 7 sequences) before and after it. The sample columns for this row would display the intensity values of this site, after the intensity have been combined from different rows that includes this particular site (if only one row includes this site, then this row's intensity can be used). For the rows with multiple sites in your data, this should be done for every site, and each site should take a new row. For the rows with a single site, the same thing should be done too, which means the site should be centered, and it should be checked with other rows to see if the same site is included elsewhere. After you have the new rows, the old rows can be removed. This might take a lot of time to do, one thing that might make this faster/more automated, is that if you have the raw data, you can run it through MaxQuant software, and the output will have the format that I described, and can be input directly to PhosPiR. Please let us know if you have further questions about how to proceed!

The input format will have to be changed in order for PhosPiR to not give an error for kinase analysis. However, if you would just like to analyze the intensities for the phosphopeptides without kinase analysis, you could exclude phosphosite identity entirely and treat the dataset like a proteomics dataset, that way you could still analyze your data with PhosPiR, just without the kinase analysis. Hope that helps!

— Reply to this email directly, view it on GitHubhttps://github.com/TCB-yehong/PhosPiR/issues/3#issuecomment-1384109539, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMD5E4V3XY4OIU7KBN7HSY3WSVIMXANCNFSM6AAAAAAT4DMA5U. You are receiving this because you authored the thread.Message ID: @.***>

TCB-yehong commented 1 year ago

Hi Jun, You are welcome! Okay good to hear! I hope everything will work out! Please contact us anytime if you have new questions!

janeseto commented 1 year ago

Hi Yehong,

I'm also a novice with R and new to phosphoproteomic analysis, so I read your paper on PhosPiR with great interest! I had a go with running PhosPiR yesterday and it mostly worked, but it hit a snag with the Network analysis:

Warning: we couldn't map to STRING 0% of your identifierstrying URL 'https://stringdb-static.org/download/protein.links.v11.5/10090.protein.links.v11.5.txt.gz' Content type 'application/octet-stream' length 84569998 bytes (80.7 MB) downloaded 80.7 MB

Error in function (type, msg, asError = TRUE) : schannel: CertGetCertificateChain trust error CERT_TRUST_IS_PARTIAL_CHAIN

I tried re-running it again today but now it is stuck at the Overview Figure stage with the following error message:

Imputation completed Compute barycenter of MAR and NMAR distributions v2-mnar Loading required package: reshape2 Loading required package: pheatmap Loading required package: fingerprint Loading required package: vegan Loading required package: permute Loading required package: lattice This is vegan 2.6-4 Loading required package: rgl Loading required package: plot3D Loading required package: magick Linking to ImageMagick 6.9.12.3 Enabled features: cairo, freetype, fftw, ghostscript, heic, lcms, pango, raw, rsvg, webp Disabled features: fontconfig, x11 Loading required package: plot3Drgl Loading required package: RColorBrewer Loading required package: FactoMineR Loading required package: factoextra Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa Error in pandoc_self_contained_html(file, file) : Saving a widget with selfcontained = TRUE requires pandoc. See here to learn more https://bookdown.org/yihui/rmarkdown-cookbook/install-pandoc.html

Any advice is much appreciated! Jane

junyang1966 commented 1 year ago

Hi Yehong, I installed R 4.2.2 today. After I drugged the run.R file from the PhosPiR-main folder into R Console window, it showed a warning sign. See below: Loading required package: svDialogs Warning in install.packages("svDialogs"): |'lib = "C:/Program Files/R/R-4.2.2/library"' is not writable Could you please advise what to do to solve the problem? Thanks, Jun

janeseto commented 1 year ago

Hi Yehong and Jun,

Regarding my issues above - I think I've resolved it - I ended up uninstalling R and reinstalling 4.2.2 and PhosPiR worked again, up to a point....

Not sure if this is useful for you, Jun, but I think some of the problems I've been encountering are caused by my institution computer and network firewalls. I'm going to re-run my analysis using my home computer and networks to see if I can get around these issues. Maybe you can try that also?

Thanks Yehong for developing PhosPiR - I'm loving what it's produced so far and I can't wait to see the kinase-substrate and phosphoprotein-protein network analysis figures it will produce for my dataset!

Jane

junyang1966 commented 1 year ago

Hi Jane, Thanks for the response. Good point that my institution's computer may have some restrictions. Since I don’t have a windows home computer, could you please let me know if it is the problem for you? If yes, I will try to find a windows computer. Jun

From: janeseto @.> Date: Tuesday, January 24, 2023 at 11:09 PM To: TCB-yehong/PhosPiR @.> Cc: Jun Yang @.>, Author @.> Subject: Re: [TCB-yehong/PhosPiR] problem with the run.R file (Issue #3)

Hi Yehong and Jun,

Regarding my issues above - I think I've resolved it - I ended up uninstalling R and reinstalling 4.2.2 and PhosPiR worked again, up to a point....

Not sure if this is useful for you, Jun, but I think some of the problems I've been encountering are caused by my institution computer and network firewalls. I'm going to re-run my analysis using my home computer and networks to see if I can get around these issues. Maybe you can try that also?

Thanks Yehong for developing PhosPiR - I'm loving what it's produced so far and I can't wait to see the kinase-substrate and phosphoprotein-protein network analysis figures it will produce for my dataset!

Jane

— Reply to this email directly, view it on GitHubhttps://github.com/TCB-yehong/PhosPiR/issues/3#issuecomment-1403144323, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMD5E4QJHYA6PVCA2ADOQU3WUC7SHANCNFSM6AAAAAAT4DMA5U. You are receiving this because you authored the thread.Message ID: @.***>

janeseto commented 1 year ago

Hi Jun, Unfortunately I'm a windows user so I really can't comment on Mac or Linux compatibility issues.... sorry! Maybe try it out and see? Good luck!! Jane

TCB-yehong commented 1 year ago

Hi, I'm sorry for the later reply! I've been sick and bedridden for the past few days. Jane, thank you for your help! I'm glad to hear it's so far working now! Yes I agree with you the "schannel: CertGetCertificateChain trust error CERT_TRUST_IS_PARTIAL_CHAIN" error could be caused by security settings. I have worked on my personal computer so I haven't encountered this, but I found a discussion on it: https://github.com/jeroen/curl/issues/193. To summarize very briefly, the latest cran packages should be issue-free, but being on a company's LAN/VPN could result in this error. The second error I think looks like an accessibility error, I'm very glad it is solved when you removed R and reinstalled 4.2.2! You are very welcome and thank you very much Jane! It's very nice to hear you like the output! Jun, warning messages can sometimes be ignored, as they are not errors. Did you run into an error? Was svDialogs package installed somewhere else or did the installation stop? 'lib = "C:/Program Files/R/R-4.2.2/library"' is not writable' looks like an accessibility issue, I was wondering if you have installed R in an admin or root account and ran it from an admin or root account? I think I agree with Jane that it might be better to use a personal computer to run it to avoid accessibility issues. The pipeline at the moment is only tested on windows, if you have Mac or Linux, I think one thing you could try is install a windows virtual machine on your system. I'm a windows user myself, I'm not that familiar with the process actually, I'm sorry! However, if you try to search for "windows virtual machine on Mac" (or "Linux"), I think you will find good software options, if you are interested in a windows virtual machine!

janeseto commented 1 year ago

Hi all,

Just want to let you know that the program ran seamlessly for me on my home computer! So it looks like accessibility restrictions indeed caused my problems.

Hope you're feeling better, Yehong!

cheers Jane

junyang1966 commented 1 year ago

Hi Jane, Thanks for the note. I also was able to run PhosPiR successfully using the example files on one of our lab windows computers. For some unknown reason, my lab windows computer didn’t work. Maybe some software on my computer is not compatible with PhosPiR. Jun

From: janeseto @.> Date: Monday, January 30, 2023 at 12:18 AM To: TCB-yehong/PhosPiR @.> Cc: Jun Yang @.>, Author @.> Subject: Re: [TCB-yehong/PhosPiR] problem with the run.R file (Issue #3)

Hi all,

Just want to let you know that the program ran seamlessly for me on my home computer! So it looks like accessibility restrictions indeed caused my problems.

Hope you're feeling better, Yehong!

cheers Jane

— Reply to this email directly, view it on GitHubhttps://github.com/TCB-yehong/PhosPiR/issues/3#issuecomment-1408107039, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMD5E4RIR67F7K35KKYLLIDWU5TMZANCNFSM6AAAAAAT4DMA5U. You are receiving this because you authored the thread.Message ID: @.***>

janeseto commented 1 year ago

Hi Yehong,

I have a question about how to interpret Circos plots. I read that section of your paper a few times but I’m still unclear which is the “corresponding statistics file” when you described Fig 4A and B about neurofilament and RPS6KA1 having the most increased activity… Can you give me some pointers as to what I should be looking for and at?

This is my one of my circos plot and what I think is the corresponding statistics file. Can you please show me how to interpret this?

Thank you so much!

Jane

From: TCB-yehong @.> Sent: Wednesday, January 25, 2023 11:39 PM To: TCB-yehong/PhosPiR @.> Cc: Jane Seto @.>; Comment @.> Subject: [EXTERNAL]Re: [TCB-yehong/PhosPiR] problem with the run.R file (Issue #3)

CAUTION: External Email. Please be cautious with attachments and clicking links

Hi, I'm sorry for the later reply! I've been sick and bedridden for the past few days. Jane, thank you for your help! I'm glad to hear it's so far working now! Yes I agree with you the "schannel: CertGetCertificateChain trust error CERT_TRUST_IS_PARTIAL_CHAIN" error could be caused by security settings. I have worked on my personal computer so I haven't encountered this, but I found a discussion on it: jeroen/curl#193https://github.com/jeroen/curl/issues/193. To summarize very briefly, the latest cran packages should be issue-free, but being on a company's LAN/VPN could result in this error. The second error I think looks like an accessibility error, I'm very glad it is solved when you removed R and reinstalled 4.2.2! You are very welcome and thank you very much Jane! It's very nice to hear you like the output! Jun, warning messages can sometimes be ignored, as they are not errors. Did you run into an error? Was svDialogs package installed somewhere else or did the installation stop? 'lib = "C:/Program Files/R/R-4.2.2/library"' is not writable' looks like an accessibility issue, I was wondering if you have installed R in an admin or root account and ran it from an admin or root account? I think I agree with Jane that it might be better to use a personal computer to run it to avoid accessibility issues. The pipeline at the moment is only tested on windows, if you have Mac or Linux, I think one thing you could try is install a windows virtual machine on your system. I'm a windows user myself, I'm not that familiar with the process actually, I'm sorry! However, if you try to search for "windows virtual machine on Mac" (or "Linux"), I think you will find good software options, if you are interested in a windows virtual machine!

— Reply to this email directly, view it on GitHubhttps://github.com/TCB-yehong/PhosPiR/issues/3#issuecomment-1403545017, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5NCP4DS75K2ZRZJYVGAO6DWUENFHANCNFSM6AAAAAAT4DMA5U. You are receiving this because you commented.Message ID: @.**@.>>

Disclaimer

This e-mail and any attachments to it (the "Communication") are, unless otherwise stated, confidential, may contain copyright material and is for the use only of the intended recipient. If you receive the Communication in error, please notify the sender immediately by return e-mail, delete the Communication and the return e-mail, and do not read, copy, retransmit or otherwise deal with it. Any views expressed in the Communication are those of the individual sender only, unless expressly stated to be those of Murdoch Children’s Research Institute (MCRI) ABN 21 006 566 972 or any of its related entities. MCRI does not accept liability in connection with the integrity of or errors in the Communication, computer virus, data corruption, interference or delay arising from or in respect of the Communication.

TCB-yehong commented 1 year ago

Dear Jane,

I'm very sorry for the late reply I have been crammed with work lately. I'm sorry I don't see a figure, but there should be excel files in the Kinase Analysis folder, named ComparisonX_significant_kinaseNetwork.csv (X stands for your comparisons, e.g. 1 or 2), the circos plots are plotted from this information. Inside, you can find the fold change and p-value of each substrate of each kinase. If you would like to view it kinase by kinase, you could sort the list by the first column (Kinase column). The ComparisonX_swingScore.csv files in the same folder also tells the activity of the predicted kinase. "pos" column tells how many counted substrates increased in phosphorylation, "neg" column tells how many counted substrates decreased in phosphorylation. "p_greater" and "p_less" columns are for p-values. if "p_greater"<0.05 and "p_less">0.05, it means the kinase is predicted to have an increased activity. If "p_greater">0.05 and "p_less"<0.05, it means the kinase is predicted to have a decreased activity.

I hope this helps! Please don't hesitate to contact me if something is unclear!

Have a great day!

Sincerely,

Ye Hong


From: janeseto @.***> Sent: Wednesday, March 8, 2023 1:59:11 PM To: TCB-yehong/PhosPiR Cc: Ye Hong; Comment Subject: Re: [TCB-yehong/PhosPiR] problem with the run.R file (Issue #3)

Hi Yehong,

I have a question about how to interpret Circos plots. I read that section of your paper a few times but I’m still unclear which is the “corresponding statistics file” when you described Fig 4A and B about neurofilament and RPS6KA1 having the most increased activity… Can you give me some pointers as to what I should be looking for and at?

This is my one of my circos plot and what I think is the corresponding statistics file. Can you please show me how to interpret this?

Thank you so much!

Jane

From: TCB-yehong @.> Sent: Wednesday, January 25, 2023 11:39 PM To: TCB-yehong/PhosPiR @.> Cc: Jane Seto @.>; Comment @.> Subject: [EXTERNAL]Re: [TCB-yehong/PhosPiR] problem with the run.R file (Issue #3)

CAUTION: External Email. Please be cautious with attachments and clicking links

Hi, I'm sorry for the later reply! I've been sick and bedridden for the past few days. Jane, thank you for your help! I'm glad to hear it's so far working now! Yes I agree with you the "schannel: CertGetCertificateChain trust error CERT_TRUST_IS_PARTIAL_CHAIN" error could be caused by security settings. I have worked on my personal computer so I haven't encountered this, but I found a discussion on it: jeroen/curl#193https://github.com/jeroen/curl/issues/193. To summarize very briefly, the latest cran packages should be issue-free, but being on a company's LAN/VPN could result in this error. The second error I think looks like an accessibility error, I'm very glad it is solved when you removed R and reinstalled 4.2.2! You are very welcome and thank you very much Jane! It's very nice to hear you like the output! Jun, warning messages can sometimes be ignored, as they are not errors. Did you run into an error? Was svDialogs package installed somewhere else or did the installation stop? 'lib = "C:/Program Files/R/R-4.2.2/library"' is not writable' looks like an accessibility issue, I was wondering if you have installed R in an admin or root account and ran it from an admin or root account? I think I agree with Jane that it might be better to use a personal computer to run it to avoid accessibility issues. The pipeline at the moment is only tested on windows, if you have Mac or Linux, I think one thing you could try is install a windows virtual machine on your system. I'm a windows user myself, I'm not that familiar with the process actually, I'm sorry! However, if you try to search for "windows virtual machine on Mac" (or "Linux"), I think you will find good software options, if you are interested in a windows virtual machine!

— Reply to this email directly, view it on GitHubhttps://github.com/TCB-yehong/PhosPiR/issues/3#issuecomment-1403545017, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5NCP4DS75K2ZRZJYVGAO6DWUENFHANCNFSM6AAAAAAT4DMA5U. You are receiving this because you commented.Message ID: @.**@.>>

Disclaimer

This e-mail and any attachments to it (the "Communication") are, unless otherwise stated, confidential, may contain copyright material and is for the use only of the intended recipient. If you receive the Communication in error, please notify the sender immediately by return e-mail, delete the Communication and the return e-mail, and do not read, copy, retransmit or otherwise deal with it. Any views expressed in the Communication are those of the individual sender only, unless expressly stated to be those of Murdoch Children’s Research Institute (MCRI) ABN 21 006 566 972 or any of its related entities. MCRI does not accept liability in connection with the integrity of or errors in the Communication, computer virus, data corruption, interference or delay arising from or in respect of the Communication.

— Reply to this email directly, view it on GitHubhttps://github.com/TCB-yehong/PhosPiR/issues/3#issuecomment-1460049857, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUVE5YLSBYFVVMGTSVFLVOTW3BYA7ANCNFSM6AAAAAAT4DMA5U. You are receiving this because you commented.Message ID: @.***>

TCB-yehong commented 1 year ago

Dear Jane,

I'm very glad to hear! I'm feeling much better thank you very much!


From: janeseto @.***> Sent: Monday, January 30, 2023 9:18:36 AM To: TCB-yehong/PhosPiR Cc: Ye Hong; Comment Subject: Re: [TCB-yehong/PhosPiR] problem with the run.R file (Issue #3)

Hi all,

Just want to let you know that the program ran seamlessly for me on my home computer! So it looks like accessibility restrictions indeed caused my problems.

Hope you're feeling better, Yehong!

cheers Jane

— Reply to this email directly, view it on GitHubhttps://github.com/TCB-yehong/PhosPiR/issues/3#issuecomment-1408107039, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUVE5YIIJ6MGWGGZ5MYNXN3WU5TMZANCNFSM6AAAAAAT4DMA5U. You are receiving this because you commented.Message ID: @.***>