AnacletoLAB / grape

🍇 GRAPE is a Rust/Python Graph Representation Learning library for Predictions and Evaluations
MIT License
502 stars 38 forks source link

Errors from `compute_pairwise_resnik` #16

Closed caufieldjh closed 1 year ago

caufieldjh commented 1 year ago

Hello!

We're trying to put together some unit tests for computing pairwise Resnik similarity, but are encountering an error with a description that doesn't seem correct.

With the input here and a test like this:

    def setUp(self) -> None:
        """Set up."""
        self.test_graph_path_nodes = "tests/resources/test_hpo_nodes.tsv"
        self.test_graph_path_edges = "tests/resources/test_hpo_edges.tsv"
        self.resnik_outpath = "tests/output/resnik_out"
        self.test_graph = Graph.from_csv(
            directed=True,
            node_path=self.test_graph_path_nodes,
            edge_path=self.test_graph_path_edges,
            nodes_column="id",
            node_list_node_types_column="category",
            sources_column="subject",
            destinations_column="object",
            edge_list_edge_types_column="predicate",
        )
        self.test_counts = {
            "HP:0000118": 23,
            "HP:0000001": 24,
            "HP:0001507": 1,
            "HP:0001574": 1,
            "HP:0001871": 1,
            "HP:0033127": 1,
            "HP:0025354": 1,
            "HP:0001608": 1,
            "HP:0001197": 1,
            "HP:0000119": 1,
            "HP:0001939": 1,
            "HP:0000707": 1,
            "HP:0025031": 1,
            "HP:0001626": 1,
            "HP:0000818": 1,
            "HP:0025142": 1,
            "HP:0002086": 1,
            "HP:0002715": 1,
            "HP:0000478": 1,
            "HP:0040064": 1,
            "HP:0002664": 1,
            "HP:0000598": 1,
            "HP:0000769": 1,
            "HP:0045027": 1,
            "HP:0000152": 1,
        }

...

    def test_compute_pairwise_resnik(self) -> None:
        """Test pairwise Resnik computation."""

        compute_pairwise_resnik(
            dag=self.test_graph,
            counts=self.test_counts,
            path=self.resnik_outpath,
        )
        self.assertTrue(os.path.exists(self.resnik_outpath))

we consistently get an error like this:

ValueError: The provided two nodes 3 and 1 do not have a shared parent node. Perhaps, the provided DAG has multiple root nodes and these two nodes are in different root portions of the DAG. Another analogous explanation is that the two nodes may be in different connected components.

Any idea what's going on here? The two nodes definitely have shared ancestry and are in the same DAG. Does this have something to do with the directionality of the graph? A full version of HPO subclass_of nodes/edges works without issue.

@hrshdhgd @justaddcoffee

LucaCappelletti94 commented 1 year ago

Solved in a brief call, the issue was that the graph was transposed, i.e. the root node was a leaf as the edge list described. We used the to_transposed method to convert it to the desired shape.