Rothamsted / knetbuilder

KnetBuilder data integration platform for building knowledge graphs. Previously known as ondex.
https://knetminer.com
MIT License
12 stars 11 forks source link

ConceptClass Filter is wrong when a relation matches data source restriction but not CC restriction. #15

Closed marco-brandizi closed 5 years ago

marco-brandizi commented 5 years ago

Reported by @KeywanHP and just fixed, I'm adding the issue to keep track of it. This is about the operations of the CC-based filter, which removes/keeps concepts and incident relations, based on specified concept class and, optionally, a given data source.

From comments in the code:

// Suppose gene1(DS=FOO) -> (r1) -> protein1(DS=TAIR), CC = Gene
// Neither gene1 nor protein are filtered (gene1 not in DS, protein1 not in CC)
// Yet getRelationsOfDataSource(TAIR) WILL pick r1, due to prot1.DS 

That's because of selectedR.retainAll(graph.getRelationsOfDataSource(dataSource)); (see the filter source) is too broad for the selection that's actually needed. We basically need to retain only the selectedC concepts that were selected by both filtering by CC and then by DS.

There's a test file showing the case (GENE-FOO and its relation should be filtered there, but not the other gene and its relation, despite the fact the latter lands on a concept from TAIR which is not a gene).

marco-brandizi commented 5 years ago

Tested with real data, closing.