RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Get NGD working with ARAX_overlay #605

Closed dkoslicki closed 4 years ago

dkoslicki commented 4 years ago
edeutsch commented 4 years ago

Note that this is probably pretty easy:

edeutsch commented 4 years ago

It's probably just 30 lines of code. I'd just do it all in ARAX_overlay.py for now so you don't have the overhead of another class at this point..

edeutsch commented 4 years ago

i.e. just modify/expand a little lines 76-93 in situ before trying anything fancy with other classes.

dkoslicki commented 4 years ago

@edeutsch Yup, your plan is exactly what I have in mind. However, I would like to do it in a separate class since:

  1. I want to keep ARAX_overlay.py as thin as possible given the number of different overlay components I have in mind.
  2. I want the practice of calling a separate class for each overlay component as that's the paradigm I/we had in mind per the last hackathon
  3. I'm going to test integrating Dan Lin's NGD code and it may be a bit complicated (i.e. not quite just 30 lines and a call out).
edeutsch commented 4 years ago

okay, sounds good.

dkoslicki commented 4 years ago

Stub complete in e4161f84 (note to self, in commit messages, issues don't get referenced if there's a space between the # and the number).

dkoslicki commented 4 years ago

It's working, but is super slow, as expected. Blocking on a setup and demo script from @power10dan as after I have that, I'll switch out the old NGD for the shiny new and improved one!

dkoslicki commented 4 years ago

@amykglen If you’ve made any attempts or run into problems trying to integrate Dan’s code for ngd, please keep me posted here. Depending on the difficulties you run into, I may give it a go myself, and then we can decide if we scrap it until post-demo or not.

amykglen commented 4 years ago

Will do! I haven't tried yet (was traveling the last few days), but plan to dive in tomorrow. Will keep you updated!

amykglen commented 4 years ago

An update (or lack thereof): I've been working on trying to integrate Dan's code this afternoon, but have not made much (or any) progress. I've been trying to get sample_workflow.py to run, but haven't gotten there yet.. (been working through errors - for example, the function in the file refers to 'self', but it is not part of a class..)

Generally pretty confused trying to figure out what's going on in the code, and how to get set up... for example, config_file.py indicates that you need to run 'your pyspark server' somewhere (currently set to localhost:6379), but I haven't figured out what the 'pyspark server' is and how to run it.

amykglen commented 4 years ago

So I bailed on working with sample_workflow.py and instead am trying to start by running XMLExtractorTests.py... have been making slightly more progress on this front, but the latest error I'm getting is:

ImportError: cannot import name 'spark_redis_jar' from 'config_file'

There seems to be no variable in config_file.py called spark_redis_jar, but SparkObjectBuilder.py tries to import such a variable...

(It's seeming that Redis is a key piece of the system(?), but I don't see a note on that anywhere ... apparently port 6379 is the default Redis port...)

amykglen commented 4 years ago

Huh, apparently spark_redis_jar was deleted in this commit: https://github.com/RTXteam/RTX/commit/76048ab39ebe816a5ab793a40e51970500243c78

amykglen commented 4 years ago

(I went ahead and deleted the spark_redis_jar import as it actually doesn't seem to be used)

So I am now able to run XMLExtractorTests.py (just using a small dataset from PubMed for now, to see if I can get going before worrying about all of PubMed): the function test_read_update_df in that file seems to run ok, but then the rest of the functions in there error out like so:

Traceback (most recent call last):
  File "XMLExtractorTests.py", line 45, in test_read_entire_schema
    df_entire = self.xml_extractor.get_entire_df()
  File "/Users/aglen/translator/RTX/RTX/code/reasoningtool/QuestionAnswering/XMLExtractor.py", line 263, in get_entire_df
    self.df_sample)
AttributeError: 'Spark_XML_ETL' object has no attribute 'df_sample'
dkoslicki commented 4 years ago

Note to self: would be really useful to create classes like add_kg_edge and add_kg_edge_attribute and call them in all the overlay functions. But, you know, time constraints...

edeutsch commented 4 years ago

actions really should not be classes themselves. Rather classes should things that can perform actions, which are the methods of the class. I already have the (somewhat fancifully named) ARAXMessenger class, which has methods like add_qedge(). It would natural to put add_kg_edge() or even just add_edge() there.

In theory add_edge() should be a method of the KnowledgeGraph class, but this is a little tricky because that class code is autogenerated by the Swagger framework, so we shouldn't manually edit knowledge_graph.py and similar files. There are probably some elegant ways to solve this problem. But for now I just had made the ARAXMessenger class.

dkoslicki commented 4 years ago

@edeutsch True. I was mostly referring to the classes/modules in the Overlay subfolder that could contain a module/class that handles the addition of QG edges and KG edges depending on:

  1. The action that was called
  2. The sub-action that the action facilitates (eg. COHD can do many different things)
  3. Custom handling of minor discrepancies between overlay actions.

Basically, I am noticing that the *.py scripts in the ARAXQuery/Overlay folder have a lot of duplicate code that only slightly varies and figured there might be an elegant way to get around this. Currently overlay_clinical_info.py is the "cleanest" one so far given that the individual COHD sub-actions paired_concept_freq, observed_expected_ratio and chi_square were almost trivial to implement after I got OverlayClinicalInfo().make_edge_attribute_from_curies(), OverlayClinicalInfo().add_virtual_edge() and OverlayClinicalInfo().add_all_edges() working

dkoslicki commented 4 years ago

Closing as now the work is being done in #654 for NGD