ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

Resolving Issues #20 and #21 and Summarizing Next Steps #27

Closed karafecho closed 2 years ago

karafecho commented 4 years ago

Thanks for meeting yesterday, @xu-hao @cbizon @stevencox @patrickkwang. I've attempted to summarize the agreed-upon plan and next steps below. Please feel free to edit.

  1. ARAGON or ARA calls to ICEES+ will point to the full ICEES+ dataset or a union of each available year for all available patients (years 2010-2016, with years 2017-2019 soon to be available). For feature variables that are not available for a given year, ICEES+ will treat the variable as missing for that year.
  2. ICEES+ will return all permutations for an entity - entity association defined in a call from ARAGORN or another ARA. For example, in response to a request for PM2.5 - asthma associations, ICEES+ will return AvgDailyPM2.5Exposure_cut x AsthmaDx, MaxDailyPM2.5Exposure_cut x AsthmaDx, AvgDailyPM2.5Exposure_qcut x AsthmaDx, etc.
  3. ICEES+ will return Chi Square statistics and uncorrected P values. ICEES+ also will return counts and bin values, as well as the binning function that was used for a given feature variable (i.e., pandas.cut, pandas.qcut, custom [SME or literature-based binning].
  4. For unbinned/unbounded feature variables such as TotalEDInpatientVisits, we will categorize. In this example, we will categorize TotalEDInpatientVisits as 0...10+. As other such variables arrive in the future, we will follow the same general approach.
  5. ARAGORN will overlay ICEES+ data as supporting evidence in its KG. Other teams may adopt a different approach.
karafecho commented 4 years ago

One thing I don't think we resolved is what responsibilities the KPs own versus the ARAs. For instance, if ARAXi approaches the ICEES+ team and requests something completely different (or in a different form) than what ARAGORN is interested in, then how shall we respond?

I'm not sure we'll be able to address this issue in GitHub, but I did want to mention it since we didn't have time to discuss the issue yesterday.

xu-hao commented 4 years ago

20 #21

cbizon commented 4 years ago

I agree with all of this except 5. IMO we should think about the operations that ICEES supports, and I thought that we agreed to 2:

1) Given a KG in a ReasonerAPI message, ICEES adds edges between all nodes in the KG 2) Given a one hop machine query, ICEES returns answers

Those were our two issues #20 and #21 . Does those being closed mean that these things are implemented now?

stevencox commented 4 years ago

@bizon is the difference between @karafecho's 5 and your 2 one hop vs applying to the whole graph? Or maybe I'm comparing the wrong things.

cbizon commented 4 years ago

5 is a little confusing because it's conflating 2 things: What operation ICEES will provide and what ARAGORN is going to do with it. I think we should write the statements about what ICEES will provide (letting those be informed by what ARAs like ARAGORN will find useful).

I think that what 5 is implying is an operation like my (1). you have a reasonerstd message where there is a knowledge_graph element, and it has some nodes. Then this operation will, for every pair of nodes in that list of nodes, generate 1 or more edges according to @karafecho 's 1-4.

But I also think that there should be an ICEES endpoint that does the "fill in the blank" operation, as long as there is only one blank. That is, an operation that looks into the query_graph section of a reasonerStd message, and provides answers to questions of the form (a)--(b), where (a) is an entity specified with a curie, and b is a type.

Does that make more sense?

xu-hao commented 4 years ago

It makes sense. I've implemented the overlay api. which is 1 by @cbizon. The 2 is essentially the associations_to_all_features api endpoint, but with some tweaks, which I'll work on after I've deployed 1. Do we need to provide a reasonerStd message based api for that, too?

stevencox commented 4 years ago

@cbizon @xu-hao Ok, that all helps. I'd suggest that it is a reasoner standard API since, architecturally, that's what the FOA calls for ARAs to interact with and it will make it a natural endpoint for other ARAs to call.

Any thoughts on a name for @cbizon's 2?

cbizon commented 4 years ago

Is overlay the term from araxi for that operation? I think overlay or decorate or something like that...

stevencox commented 4 years ago

I think overlay makes more sense for 1, eh? i.e. one has a giant graph and wants to decorate it. But 2 is a question graph w/two nodes and we want all associations between them right?

cbizon commented 4 years ago

Sorry, I missed that you were asking for a name for 2. 2 is a question graph like (diabetes)--(drug). i.e. one of the nodes is bound. I guess part of the naming is do you anticipate ever supporting more than 1-hops? LIke a generic graph query interface, and if so, would it be this function or a different one. If yes, then I might call this "graph_query" or something like that, but if no, then I like get_associations (but it is kinda generic)

stevencox commented 4 years ago

Right, get_associations is boring but it's my current fav. Not that naming is the most vital piece of this. But numbers get a little hard to follow.

karafecho commented 4 years ago

I am admittedly a bit confused, and I'm thinking that a graphic might help clarify things, not just for me, but for others inside/outside of the core team, but I would like to make a few points, fwiw.

  1. I don't think I gave much thought to (5) when I wrote it. I also think that I may have misused the term "overlay", at least how it is described above.
  2. @bizon commented: "I think we should write the statements about what ICEES will provide (letting those be informed by what ARAs like ARAGORN will find useful)." While I agree to an extent, I think that ICEES operations should be driven primarily by informed use case questions, but perhaps this is what you are suggesting?
  3. I don't think we should be prescriptive re ICEES operations. IMO, ICEES operations and functionalities should evolve over time, in response to new use cases and use case questions.
  4. I don't think the one-hop "entity with curie" - "entity type" operation will prove to be very useful, but it's probably sufficient as a starting point.
  5. I don't think any operation will be very useful unless we identify an approach for addressing context.

Just my two cents...

cbizon commented 4 years ago

Yep, I agree with pretty much all of what you say @karafecho . My comment that you mention above is just trying to decouple the operations from how the results of those operations would be used. We tend to think that overlay->support and onehop->pathfinding, but there's no reason those couldn't be inverted.

I'm happy to help with a diagram, but my visual skills are pretty low. I'm imagining that this would be a graphic to explain what each endpoint does?

karafecho commented 4 years ago

WRT the graphic, I think we need a conceptual visual that demonstrates overlay -> support, one-hop-> pathfinding and the inverse. That way, we can figure out where ICEES+ endpoints fit in, as well as other KP endpoints, e.g., COHD, and perhaps use the graphic to also illustrate how different ARAs plan to integrate the KPs. Does that make sense and seem reasonable? I don't think we need a pretty graphic, just something.

karafecho commented 4 years ago

@cbizon @stevencox @xu-hao : Please comment.

cbizon commented 4 years ago

Here's my attempt at what I think would go into a graphic onehop overlay

cbizon commented 4 years ago

The first one illustrates the 1 hop function, and the second is the overlay

Left side is input, right side is output, Q, KG, A are the 3 parts of a ReasonerAPI message

cbizon commented 4 years ago

There's actually an interesting question on whether the overlay should pay attention to any input answers as well, and only look for pairs that occur in the same answer, but I have not illustrated that here.

karafecho commented 4 years ago

This is very helpful. Thanks, @cbizon!

xu-hao commented 4 years ago

@karafecho @cbizon Thanks. That really clarifies things. With regard to 2, should we also add loops, i.e., edges that connect a node to itself or it is not necessary? The current implementation that I gave Patrick has loops.

karafecho commented 4 years ago

Reasoner Standard API specs:

Input (question/query) formats

  1. Natural language question (NLQ) e.g. “What genes are associated with X?” ambiguous difficult/impossible to parse, in general
  2. “Query type ID” (QID) e.g. Q0 (“What is X?”), Q22 (WF1MOD2), … closed set - cannot ask new types of questions
  3. Question/query graph (QG) limited expressiveness

Output (results) formats

  1. “dense” format result includes complete nodes and edges sometimes needlessly verbose difficult to align/compare/chain
  2. “message”/“bindings” format result includes bindings between shared knowledge graph and question/query graph concise compatible with query/question graph (QG) input - can chain modules

Outdated? (See below)

patrickkwang commented 4 years ago
  1. Natural language question (NLQ) e.g. “What genes are associated with X?” ambiguous difficult/impossible to parse, in general
  2. “Query type ID” (QID) e.g. Q0 (“What is X?”), Q22 (WF1MOD2), … closed set - cannot ask new types of questions

These are no longer supported input formats. The query graph is the only standard input format now.

  1. “dense” format result includes complete nodes and edges sometimes needlessly verbose difficult to align/compare/chain

This is no longer a supported output format. The "bindings" format is the only standard output format now.

karafecho commented 4 years ago

Thanks, @patrickkwang. I pulled the input/output info from what appears to be an outdated slide deck. The goal, however, was to more formally define the Q, KG, and A on the left side of Chris' figure, as well as the output on the right.

As you can see, my familiarity with the Reasoner Standard API is only at a high level. My hope is to fill in some of the details, so I can better understand the challenges. Does that make sense?

karafecho commented 4 years ago

Reasoner Standard API specs (updated):

Input:

Question/query graph (QG) limited expressiveness“message”/“bindings” format

Output:

“message”/“bindings” format result includes bindings between shared knowledge graph and question/query graph concise compatible with query/question graph (QG) input - can chain modules

karafecho commented 4 years ago

Detailed input/output specs: GitHub.

cbizon commented 4 years ago

I contemplated drawing the figure with bindings, but decided that it overcomplicated the image.

@xu-hao I guess the loops come from when there are more than one features associated with an input concept? I don't think they're necessary, but I also don't think that they hurt anything.

karafecho commented 4 years ago

Hao and I (mostly Hao) pulled together the 'inventory' specs found in this Data Modeling doc, which was discussed during today's meeting. Comments are welcome.

I think I'm feeling more comfortable with all of this now and better understanding the details. Certain (minor?) things are still a bit fuzzy, but that's probably my problem.

karafecho commented 4 years ago

FYI: Hao, Steve, and I met with David K. and Megha S. (ARAXi Expander Agent). They are going to take a shot at the overlay endpoint and then try the one-hop endpoint after Hao finishes developing it. The queries that I sent previously can be adapted to support the new endpoints. Happy to do that, if that would be helpful.

xu-hao commented 4 years ago

@cbizon loop can also be between the same features associated with an input concept.

cbizon commented 4 years ago

@xu-hao I'm not sure I understand, can you give an example?

xu-hao commented 4 years ago

@cbizon For example a curie X can map to a feature Y, we can generate a contingency matrix between Y and itself. which will have p-value of 0. This will become and edge between X and itself, which is a loop on X.

cbizon commented 4 years ago

I think at the feature level, calculating association of a feature with itself is a bug?

xu-hao commented 4 years ago

I have it there for completeness. But I can filter it out.