ghuynh24 / gseaTissueBias

0 stars 0 forks source link

guide to your R files? #1

Open paul-shannon opened 4 years ago

paul-shannon commented 4 years ago

Hi @ghuynh24,

Finally - I have started on your project.

I see you have ~800 lines of code, but only one function defined, which appears in two versions ("Gene2Entrez") This lack of structure makes your work hard to understand!

Also - sorry if I seem to be preachy - I see no tests, and no examples. Therefore I my entry point into understanding, running and creating a webapp for your system is not yet clear.

Can you help me understand what I will need to know in order to invoke your code on your POU3F2 example?

ghuynh24 commented 4 years ago

Hi @paul-shannon,

I uploaded the R files of the database so that you don't have to recreate them. The Elevated version is titled "tissueElevatedgenestable_db" and the Enriched genes version is titled "tissueEnrichedgenestable_db".

Then I would retrieve the names of the genes upregulated and downregulated by POU3F2, specifically those referred to in the Pearl article as enriched for genes in cell cycle (Cluster 6 genes) and transcription (Cluster 1 genes). I would download Table S4 from Supplemental Information.

At this point I would want to plug in the gene lists from those clusters into the database and generate a percent coverage of those genes by each of the 6 gene set collections (GOx3, KEGG, BIOCARTA, REACTOME). This would tell me if KEGG, BIOCARTA, and REACTOME could even be used to do pathway enrichment analysis...the answer would be no if the percent coverage is much lower for these than for GOx3.

If the answer is no, then the answer to my question ("Would we find similar pathways enriched for POU3F2-regulated gene clusters if KEGG, REACTOME, or BIOCARTA were used for pathway enrichment analysis instead of Gene Ontology?") would be: most likely not. If there is not enough gene coverage, pathway enrichment would be skewed and it would be highly unlikely that the similar pathways would be as highly enriched as they are when analyzed with GO pathways.

At this point I don't know how to search for an entire gene list in the database since all of the rows are for individual genes. I am hoping you may have some ideas for doing so. I hypothesize that it is possible since I can search for a gene list in Enrichr.

I am also wondering if you need to generate my data frames from scratch in order to generate the database. If so I will do my best to annotate my code as much as possible so you can follow my steps. Sorry that it is not properly structured.

paul-shannon commented 4 years ago

Thanks, Gina.

Here’s what I need. A function to call, and several examples of calling it. This function reads your database, returns user-friendly scores of the sort your project promises prospective users.

Possible?

On Jul 9, 2020, at 1:50 PM, ghuynh24 notifications@github.com wrote:

Hi Paul,

I uploaded the R files of the database so that you don't have to recreate them. The Elevated version is titled "tissueElevatedgenestable_db" and the Enriched genes version is titled "tissueEnrichedgenestable_db".

Then I would retrieve the names of the genes upregulated and downregulated by POU3F2, specifically those referred to in the Pearl article as enriched for genes in cell cycle (Cluster 6 genes) and transcription (Cluster 1 genes). I would download Table S4 from Supplemental Information.

At this point I would want to plug in the gene lists from those clusters into the database and generate a percent coverage of those genes by each of the 6 gene set collections (GOx3, KEGG, BIOCARTA, REACTOME). This would tell me if KEGG, BIOCARTA, and REACTOME could even be used to do pathway enrichment analysis...the answer would be no if the percent coverage is much lower for these than for GOx3.

If the answer is no, then the answer to my question ("Would we find similar pathways enriched for POU3F2-regulated gene clusters if KEGG, REACTOME, or BIOCARTA were used for pathway enrichment analysis instead of Gene Ontology?") would be: most likely not. If there is not enough gene coverage, pathway enrichment would be skewed and it would be highly unlikely that the similar pathways would be as highly enriched as they are when analyzed with GO pathways.

At this point I don't know how to search for an entire gene list in the database since all of the rows are for individual genes. I am hoping you may have some ideas for doing so. I hypothesize that it is possible since I can search for a gene list in Enrichr.

I am also wondering if you need to generate my data frames from scratch in order to generate the database. If so I will do my best to annotate my code as much as possible so you can follow my steps. Sorry that it is not properly structured.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ghuynh24 commented 4 years ago

Hi Paul,

Yes! I will start working on that and get it to you as soon as I can.

Thank you, Gina

ghuynh24 commented 4 years ago

Hi Paul,

I have finally written the code for calling functions for the database. I uploaded the file and it is titled "Code for Official Database". I can add to this as I answer new case study questions.

Please let me know if you have problems reading the code. Gina

paul-shannon commented 4 years ago

Thanks @ghuynh24. I will take a look.

paul-shannon commented 4 years ago

@ghuynh24,
I guess (correctly?) that PAUL_Code for Database Entrezupdated.R is a good place for me to start. But the first executable line tries to load a file not found in the repo, not in any directory within the repo:

load("~/ISB/HPA_Tissues_Entrez_11.12/Vagina_Enriched_Genes.Rdata")

Hard for me to proceed!

paul-shannon commented 4 years ago

@ghuynh24 - Gina, I imagine you have never had much training in, nor exposure to standard practices in developing code communally - that is, in writing code that will be shared with others.

A basic idea is presented here. It presents a trivial function and its accompanying test.

I don't know how much time you have left, but if you have enough, this style of coding (function + test) is a great way to share code. The test not only ensures that your works, but it also shows how it, in context, is meant to be used.

ghuynh24 commented 4 years ago

@paul-shannon Hi Paul, Yes that's true. I will take a look at your example and try to figure it out. My existing code is unfortunately less user-friendly but I am hoping that it can still be followed to some degree. Could you please take a look at the file titled "Code for Official Database.R" to see if you can use some of it?

Thanks, Gina

paul-shannon commented 4 years ago

@ghuynh24, Hi Gina,

"Code for Official Database.R", after I fixed some typos, does run. That's progress. There are no functions in that file, or elsewhere in the code.
Functions are essential. They specify, and they provide, the logical operations of any project, via their

I guess that I was not persuasive in my earlier suggestion of adding functions? I don't think I can render your work as a webapp without them.

ghuynh24 commented 4 years ago

@paul-shannon Hi Paul,

I will talk to Alison and see if I can resolve this. I have decided to continue working on this project beyond the end of July in order to finish the paper.

Thank you, Gina

paul-shannon commented 4 years ago

@ghuynh24, If your schedule permits, and if you are so inclined, we could work together on this. Then your lack of experience with coding could be an opportunity to learn, rather than primarily an annoying obstacle to getting your paper out.

ghuynh24 commented 4 years ago

@paul-shannon Hi Paul, sorry I have been gone for a few days. I talked to Nathan about the direction of my project and we agreed that I should focus on finishing writing the manuscript since my hours are now reduced. However, I talked to Alison and am working on writing the functions correctly. I will keep you updated.

Thanks, Gina

paul-shannon commented 4 years ago

Hi Gina,

Good luck with your compressed schedule.

I suggest that you work -with- me, rather than in isolation, to transform your linear code into functions. That process is not always easy to grasp - and the quickest way to get it done, if that is your goal, may well be to work in collaboration with me, and with a sense of give and take, back and forth, trial and revision.

On Aug 4, 2020, at 8:48 AM, ghuynh24 notifications@github.com wrote:

@paul-shannon Hi Paul, sorry I have been gone for a few days. I talked to Nathan about the direction of my project and we agreed that I should focus on finishing writing the manuscript since my hours are now reduced. However, I talked to Alison and am working on writing the functions correctly. I will keep you updated.

Thanks, Gina

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ghuynh24 commented 4 years ago

Hi Paul,

Thank you. Let's give it a try. I have a manuscript draft due at the end of this week so I have been focusing on that. I work on Thursdays and Fridays for the ISB so I am hoping we can schedule a meeting for next Thursday or Friday to work on my functions. Please let me know if you will be available.

Gina

paul-shannon commented 4 years ago

Yes, indeed. Next Thursday August 20th. Any time in the afternoon is good for me.

Good luck with your manuscript!

On Aug 11, 2020, at 9:14 AM, ghuynh24 notifications@github.com wrote:

Hi Paul,

Thank you. Let's give it a try. I have a manuscript draft due at the end of this week so I have been focusing on that. I work on Thursdays and Fridays for the ISB so I am hoping we can schedule a meeting for next Thursday or Friday to work on my functions. Please let me know if you will be available.

Gina

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ghuynh24 commented 4 years ago

Hi Paul. Let's meet 3-4PM tomorrow. I will send you a Google Calendar invite.

Looking forward to meeting with you! -Gina

ghuynh24 commented 4 years ago

Sorry, does 2-3PM work? I have a pregnancy working group meeting from 3-4PM.

Thanks Paul! Gina

paul-shannon commented 4 years ago

Hi Gina,

Today (Thursday) 2pm works fine. Please read through the “thinking in functions” information beforehand.

https://www.cs.utah.edu/~germain/PPS/Topics/functions.html

On Aug 19, 2020, at 11:26 PM, ghuynh24 notifications@github.com wrote:

Sorry, does 2-3PM work? I have a pregnancy working group meeting from 3-4PM.

Thanks Paul! Gina

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.