Closed dkoslicki closed 4 years ago
Note that this is probably pretty easy:
It's probably just 30 lines of code. I'd just do it all in ARAX_overlay.py for now so you don't have the overhead of another class at this point..
i.e. just modify/expand a little lines 76-93 in situ before trying anything fancy with other classes.
@edeutsch Yup, your plan is exactly what I have in mind. However, I would like to do it in a separate class since:
okay, sounds good.
Stub complete in e4161f84 (note to self, in commit messages, issues don't get referenced if there's a space between the #
and the number).
It's working, but is super slow, as expected. Blocking on a setup and demo script from @power10dan as after I have that, I'll switch out the old NGD for the shiny new and improved one!
@amykglen If you’ve made any attempts or run into problems trying to integrate Dan’s code for ngd, please keep me posted here. Depending on the difficulties you run into, I may give it a go myself, and then we can decide if we scrap it until post-demo or not.
Will do! I haven't tried yet (was traveling the last few days), but plan to dive in tomorrow. Will keep you updated!
An update (or lack thereof): I've been working on trying to integrate Dan's code this afternoon, but have not made much (or any) progress. I've been trying to get sample_workflow.py to run, but haven't gotten there yet.. (been working through errors - for example, the function in the file refers to 'self', but it is not part of a class..)
Generally pretty confused trying to figure out what's going on in the code, and how to get set up... for example, config_file.py indicates that you need to run 'your pyspark server' somewhere (currently set to localhost:6379), but I haven't figured out what the 'pyspark server' is and how to run it.
So I bailed on working with sample_workflow.py
and instead am trying to start by running XMLExtractorTests.py... have been making slightly more progress on this front, but the latest error I'm getting is:
ImportError: cannot import name 'spark_redis_jar' from 'config_file'
There seems to be no variable in config_file.py
called spark_redis_jar
, but SparkObjectBuilder.py tries to import such a variable...
(It's seeming that Redis is a key piece of the system(?), but I don't see a note on that anywhere ... apparently port 6379 is the default Redis port...)
Huh, apparently spark_redis_jar
was deleted in this commit: https://github.com/RTXteam/RTX/commit/76048ab39ebe816a5ab793a40e51970500243c78
(I went ahead and deleted the spark_redis_jar
import as it actually doesn't seem to be used)
So I am now able to run XMLExtractorTests.py
(just using a small dataset from PubMed for now, to see if I can get going before worrying about all of PubMed): the function test_read_update_df
in that file seems to run ok, but then the rest of the functions in there error out like so:
Traceback (most recent call last):
File "XMLExtractorTests.py", line 45, in test_read_entire_schema
df_entire = self.xml_extractor.get_entire_df()
File "/Users/aglen/translator/RTX/RTX/code/reasoningtool/QuestionAnswering/XMLExtractor.py", line 263, in get_entire_df
self.df_sample)
AttributeError: 'Spark_XML_ETL' object has no attribute 'df_sample'
Note to self: would be really useful to create classes like add_kg_edge
and add_kg_edge_attribute
and call them in all the overlay
functions. But, you know, time constraints...
actions really should not be classes themselves. Rather classes should things that can perform actions, which are the methods of the class. I already have the (somewhat fancifully named) ARAXMessenger class, which has methods like add_qedge(). It would natural to put add_kg_edge()
or even just add_edge()
there.
In theory add_edge()
should be a method of the KnowledgeGraph class, but this is a little tricky because that class code is autogenerated by the Swagger framework, so we shouldn't manually edit knowledge_graph.py and similar files. There are probably some elegant ways to solve this problem. But for now I just had made the ARAXMessenger class.
@edeutsch True. I was mostly referring to the classes/modules in the Overlay
subfolder that could contain a module/class that handles the addition of QG edges and KG edges depending on:
Basically, I am noticing that the *.py
scripts in the ARAXQuery/Overlay
folder have a lot of duplicate code that only slightly varies and figured there might be an elegant way to get around this. Currently overlay_clinical_info.py
is the "cleanest" one so far given that the individual COHD sub-actions paired_concept_freq
, observed_expected_ratio
and chi_square
were almost trivial to implement after I got OverlayClinicalInfo().make_edge_attribute_from_curies()
, OverlayClinicalInfo().add_virtual_edge()
and OverlayClinicalInfo().add_all_edges()
working
Closing as now the work is being done in #654 for NGD
overlay_clinical_info