Fisher exact test doesn't infer node category

dkoslicki commented 3 years ago

For this query:

add_qnode(id=MONDO:0001475, key=n0)
add_qnode(category=biolink:Protein, key=n1, is_set=true)
add_qedge(subject=n0, object=n1, key=e0)
add_qnode(category=[biolink:Drug], key=n2)
add_qedge(subject=n1, object=n2, key=e1)
expand(edge_key=e0)
expand(edge_key=e1)
overlay(action=fisher_exact_test,subject_qnode_key=n0,virtual_relation_label=F1,object_qnode_key=n1,filter_type=top_n,value=50)
overlay(action=fisher_exact_test,subject_qnode_key=n1,virtual_relation_label=F2,object_qnode_key=n2,filter_type=top_n,value=50)
overlay(action=compute_ngd, virtual_relation_label=N2, subject_qnode_key=n0, object_qnode_key=n2)
overlay(action=predict_drug_treats_disease, virtual_relation_label=P1, subject_qnode_key=n2, object_qnode_key=n0)
resultify()
filter_results(action=limit_number_of_results, max_results=30)

I get the following error: 2021-04-21T21:00:21.421021 INFO: Performing Fisher's Exact Test to add p-value to edge attribute of virtual edge 2021-04-21T21:00:21.843015 ERROR: [UnknownError] Subject node with qnode key n0 was set to None in Query Graph. Please specify the node type

Even though the category wasn't specified, it really should infer it from the preferred category.

dkoslicki commented 3 years ago

@chunyuma let me know if you don't have the bandwidth to work on this and I can assign someone else

chunyuma commented 3 years ago

Hi @dkoslicki, I can work on this but this seems not just relevant to overlay(action= fisher_exact_test). I might need to ask the suggestion from @amykglen.

Hi @amykglen, based on @dkoslicki's suggestion, do you think inferring node type from the preferred category can achieve within expand()? Will this affect anything? If it will, I can call NodeSynonymizer to infer preferred node type within fisher_exact_test.

amykglen commented 3 years ago

that seems fine to me, @chunyuma. can't think of any problems it could create.

dkoslicki commented 3 years ago

Ah, I didn't know if this involved expand or not. Maybe that was one of the attributes that got dropped in #1359 ? If not, hopefully it would be a simple attribute lookup

amykglen commented 3 years ago

nope, category wasn't dropped in #1375/#1359.

chunyuma commented 3 years ago

@amykglen, I remember I asked you a similar question before. And you said that if the user didn't set the node type, the expand would not automatically assign a node type to it, right? I don't remember what reason for this. If you still don't agree that we should automatically assign a node type to the query node without node type, I can infer it only within fisher_exact_test. But it should not affect query graph. How do you think about this? Thanks!

amykglen commented 3 years ago

ah, I see. that sounds familiar, but I think the system has changed so much it's probably not relevant anymore.

we could make ARAX_expander.py add the preferred category to the QG. but should this alter the message.query_graph (that all other modules use)? or only the copy of the QG that expand uses? if the former, does it make sense for that to happen in expand, or should it happen upstream? (like in add_qnode()?)

chunyuma commented 3 years ago

I think the safe way to add preferred category is to only infer it within fisher_exact_test if other modules don't need this information. If other modules also need this information, perhaps adding to upstream modules might be better. I have no idea.

dkoslicki commented 3 years ago

I kinda like the idea of having it in add_qnode, so everything downstream will know as much info as possible (will help some KP's that require categories specified). A while ago, I thought about this for add_qnode by name as well: before we had the Synonym tab, I wished there was a way to get the curie and category from the name so other KP's could use it.

dkoslicki commented 3 years ago

I think the safe way to add preferred category is to only infer it within fisher_exact_test if other modules doesn't need this information. If other modules also need this information, perhaps adding to upstream modules might be better. I have no idea.

@amykglen with the new intelligent KP selection in expand, would adding the category (and possibly curie) let other KP's be invoked? I recall some of them requiring this info

amykglen commented 3 years ago

@amykglen with the new intelligent KP selection in expand, would adding the category (and possibly curie) let other KP's be invoked? I recall some of them requiring this info

yeah, I think that it would - I think in the past it wasn't necessarily a good thing to infer the category for qnodes because the 'preferred category' wasn't as trustworthy and many KPs were sensitive to category, so you might end up getting no results by inferring it, but I think now it would be a good thing to do.

edeutsch commented 3 years ago

I kinda like the idea of having it in add_qnode, so everything downstream will know as much info as possible (will help some KP's that require categories specified). A while ago, I thought about this for add_qnode by name as well: before we had the Synonym tab, I wished there was a way to get the curie and category from the name so other KP's could use it.

You can add_qnode() with a name: add_qnode(name=acetaminophen) works. I think there was/is category assigning code in add_qnode() but it was disabled for the reason @amykglen describes. There is also an ongoing discussion in TRAPI about what to do when both an id and a category is supplied (and the category is not the same as the KG expects). Not clear.

chunyuma commented 2 years ago

Do we have any decisions on this issue? Should we need to assign a node type in add_qnode or in fisher_exact_test?

chunyuma commented 2 years ago

It seems like this issue is related to issue #1817. In issue 1817, our conclusion is to assign NamedThing category to the qnode that is not specified with a specific node type. So I will just follow the idea of issue 1817 to fix this issue as well.

chunyuma commented 2 years ago

This issue has been resolved. Now the FET will set the node without a specified node type to 'biolink:NamedThing' and use it to calculate the FET p-value. This change needs to be further tested in production. Once it works, then I think we can close this issue.

amykglen commented 2 years ago

@chunyuma - I think the ideal solution would be to use the 'preferred' category for a given curie according to the Node Synonymizer, rather than just assigning NamedThing (for pinned query nodes). I think it makes sense to do that within FET rather than add_qnode since I'm not aware of other modules that need that information at this point.

chunyuma commented 2 years ago

Thanks @amykglen. I change it to use the 'preferred' category if it does have a preferred category returned by the Node Synonymizer, otherwise just assign NamedThing to it.

chunyuma commented 4 months ago

close it as it has been completed.

RTXteam / RTX

Fisher exact test doesn't infer node category #1394