RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

New question 4 ready for UI integration #44

Closed dkoslicki closed 6 years ago

dkoslicki commented 6 years ago

@saramsey @edeutsch To test the robustness of our new integration scheme, I've implemented a new question type: "What diseases are similar to X?" where X is a disease (see commits 5d9ab3eb43c84012b5da7cc2189cda810795f37d and a5f6e3647f546e67ea131327409f609edfeb3ff5). I've created this in a new branch (called Q4), and after we've merged NewStdAPI into master, we can see how straightforward it will be to implement this new question type into the UI, and then merge Q4 into master as well. Then we'll be half way to the minimal viable product question count!

As for details of the implementation, after a bit of cypher jiggering (8a7029199920593087529ee347c59a16c91a9bac and a72ac90b27958bcf354583f7194352382004644d), I am able to count the number of nodes (of a certain type) that are shared between any two node types in the graph. This allows me to compute the Jaccard index between the number of phenotypes shared in common between two diseases (and gives an informative error if a disease has no phenotypes but it's parent does). I then return all diseases which have a large enough Jaccard index (given by the --threshold parameter with default Jaccard=0.20).

@edeutsch This question again returns each disease as a separate node (as in #41). I can change this when requested.

saramsey commented 6 years ago

+1 to this idea


Stephen Ramsey Assistant Professor, Oregon State University

On Mar 29, 2018, at 12:27 AM, David Koslicki notifications@github.com<mailto:notifications@github.com> wrote:

@saramseyhttps://github.com/saramsey @edeutschhttps://github.com/edeutsch To test the robustness of our new integration scheme, I've implemented a new question type: "What diseases are similar to X?" where X is a disease (see commits 5d9ab3ehttps://github.com/RTXteam/RTX/commit/5d9ab3eb43c84012b5da7cc2189cda810795f37d and a5f6e36https://github.com/RTXteam/RTX/commit/a5f6e3647f546e67ea131327409f609edfeb3ff5). I've created this in a new branch (called Q4), and after we've merged NewStdAPI into master, we can see how straightforward it will be to implement this new question type into the UI, and then merge Q4 into master as well. Then we'll be half way to the minimal viable product question count!

As for details of the implementation, after a bit of cypher jiggering (8a70291https://github.com/RTXteam/RTX/commit/8a7029199920593087529ee347c59a16c91a9bac and a72ac90https://github.com/RTXteam/RTX/commit/a72ac90b27958bcf354583f7194352382004644d), I am able to count the number of nodes (of a certain type) that are shared between any two node types in the graph. This allows me to compute the Jaccard index between the number of phenotypes shared in common between two diseases (and gives an informative error if a disease has no phenotypes but it's parent does). I then return all diseases which have a large enough Jaccard index (given by the --threshold parameter with default Jaccard=0.20).

@edeutschhttps://github.com/edeutsch This question again returns each disease as a separate node (as in #41https://github.com/RTXteam/RTX/issues/41). I can change this when requested.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/RTXteam/RTX/issues/44, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFTe-4WSaaEHtOL7URS2ND_UmVaRwVJJks5tjIzNgaJpZM4S_3mR.

edeutsch commented 6 years ago

okay, this sounds good. so I will wait on integration for now.

After pondering the issue some more, I'm warming up to the idea of N different results. Because the feedback/annotations will be on a per-result basis, if we make them N different results (as you've already done), then some can be voted up and some can be voted down.

edeutsch commented 6 years ago

I will try to do this integration this evening to see how quickly we can roll out a new question.

dkoslicki commented 6 years ago

Excellent! There are a couple minor changes, so as usual, git pull origin Q4

edeutsch commented 6 years ago

So I started bumbling my way through this. the first thing I bumped into is that I wasn't sure what to do with the branch Q4. It was ahead of master. Yet also behind master. I was hesitant to barge into your Q4 and start making bit changes. So I thought it would be a good idea and good practice to merge it into NewStdAPI since this is my branch and seemed like good practice. This took a bit of bumbling around, but I succeeded in the end I think. There was one conflict to resolve in Q3 parsing. I think I fixed it.

But now when I run a sample question in QuestionTranslator, I get a crash: Question: what diseases are similar to glaucoma Crash: Traceback (most recent call last): File "QuestionTranslator.py", line 869, in main() File "QuestionTranslator.py", line 863, in main res = txltr.translate(question) File "QuestionTranslator.py", line 667, in translate return self.format_answer(results_dict, logging=logging) File "QuestionTranslator.py", line 257, in format_answer "restatedQuestion": "%s" % self.restate_question(corpus_index, terms), "originalQuestion": input_text}] File "QuestionTranslator.py", line 182, in restate_question restated = "What diseases have phenotypes similar to %s" % RU.get_node_property(terms["disease_name"]) TypeError: get_node_property() missing 1 required positional argument: 'node_property'

This will happen if you just run python3 QuestionTranslator.py (after a git pull of NewStdAPI)

Can you fix that? @dkoslicki Testing a new question that should work in this manner would be a good idea going forward adding more questions.

edeutsch commented 6 years ago

I have hooked up all the pipes. Pretty easy. If you can fix the above problem, it just might all work (after git pull and flask restart)! or we migth discover the next problem. http://rtx.ncats.io/devED/

dkoslicki commented 6 years ago

@edeutsch Ok, I've pushed a fix to NewStdAPI (I had overlooked adding tests for translate() and just had them for find_question_paremeters(); tests now added). See commit fc77d68ede4c0a5070d2775140bb5f88c3b0e174.

edeutsch commented 6 years ago

Yep, that was it. Git pull and flask restart and now it works great:

http://rtx.ncats.io/devED/

One comment about the results. Each result is a single node of the similar disease. I wonder if it would be more useful and enlightening to show both the input disease and the similar disease and whatever elements in between the triggered the similarity?

Anyway, Q4 online!

dkoslicki commented 6 years ago

@edeutsch Great! The "text" portion of the answer does indeed show both the input disease and similar disease. Did you mean that we should include both in the response graph? I could definitely do that, and I could also return the phenotypes they have in common. The only issue is that there can be hundreds of phenotypes in common, so without some sort of graphical visualization (like the attached), the results would be almost impossible to interpret. graph

edeutsch commented 6 years ago

yes, that's what I was thinking of! I agree that this would be for the graph visualization, which is definitely what we want to do. You are right that the sheer number of phenotypes is a bit daunting. maybe keep that feature request in your pocket until we have the graph viz of results ready and then test it.

dkoslicki commented 6 years ago

@edeutsch I assume we can close this issue since we got this question up and running?

edeutsch commented 6 years ago

sure, yes.