SwissDataScienceCenter / renku-graph

renku-graph
https://renku.readthedocs.io/en/latest/reference/services/graph-services.html?highlight=graph#graph-services
Apache License 2.0
10 stars 2 forks source link

bug: request for lineage throws an exception #119

Closed ciyer closed 5 years ago

ciyer commented 5 years ago

Requesting the lineage for one of the notebooks in Renkulab, cramakri/renku-tutorial-flights results in a stack trace on the knowledge graph service:

Viewing

https://renkulab.io/projects/758/files/lineage/notebooks/00-FilterFlights.ran.ipynb

yields

java.lang.IllegalArgumentException: There are orphan nodes
8/30/2019 1:56:38 PM at ch.datascience.knowledgegraph.graphql.lineage.model$Lineage$.from(model.scala:38)
8/30/2019 1:56:38 PM at ch.datascience.knowledgegraph.graphql.lineage.IOLineageFinder.$anonfun$toLineage$1(LineageFinder.scala:68)
8/30/2019 1:56:38 PM at ch.datascience.knowledgegraph.graphql.lineage.IOLineageFinder.$anonfun$findLineage$2(LineageFinder.scala:58)

On the server.

jachro commented 5 years ago

I think I know what has happened here. As we know 'renku log' fails quite often during triples generation and because of that data in our KG is not complete. What this particular exception says is that there's an edge found by the query but we couldn't find a node matching to it. So quite clearly we generated triples for some commit but either for the preceding or following commit, the triples generation was not successful. What we can do? We've got this bug https://github.com/SwissDataScienceCenter/renku-python/issues/616 to fix and maybe so other can be raised as there are other causes of triples generation failures.

ciyer commented 5 years ago

This is also visible on dev at https://dev.renku.ch/projects/cramakri/renku-tutorial-flights/files/lineage/notebooks/00-FilterFlights.ran.ipynb

jachro commented 5 years ago

At least it's consistent. Let me try to find the relevant exception so we know what to fix.

jachro commented 5 years ago

I've just done some investigation and it looks it has to be something else as there are no exceptions during triples generations. I'll look into that more.

jachro commented 5 years ago

There was some mysterious logic conditioning the raw data returned from the Sparql lineage query. This logic was removing nodes matching some specific criteria and thus making the result corrupted. That seemed to be wrong and was deleted. The request mentioned in the previous comments does work correctly now.

rokroskar commented 5 years ago

There was some mysterious logic

🤔

jachro commented 5 years ago

Yeah, there was some logic which I ported from the original implementation done by Jiri. Although I knew how it works, I never understood why do we need it. So I thought, right, let's maybe keep it and there potentially are cases I don't know about. And yes, it got triggered last week and caused some errors :) So I removed the logic, tested different scenarios and did find everything works fine.