Open randykerber opened 6 years ago
file: PDB/LINKSET_EXACT_PDB20171110.ttl
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix ops: <http://chemistry.openphacts.org//> .
@prefix : <http://chemistry.openphacts.org/download/20171110/PDB/LINKSET_EXACT_PDB20171110.ttl#> .
<http://chemistry.openphacts.org/download/20171110/PDB/LINKSET_EXACT_PDB20171110.ttl> void:inDataset <http://chemistry.openphacts.org/download/20171110/void_2017-11-10.ttl#pdb_exactMatch> .
The only triple in the file is a void:inDataset
statement. There are no "link" triples.
I remember running into this issue earlier in the year. When I looked into it I recall the Exception was caused by some sort of pre-check phase performed on each file, where some piece of code attempted to read the first link triple/line of the file and verify that it had the correct link-predicate (and perhaps other checks). However, when there was no "first record", it resulted in an Exception. I could not see a way to fix without possibly wrecking other functionality.
This was probably 6+ months ago, so memory is a bit fuzzy.
The correct behavior is probably to issue some sort of WARNING message that an empty linkset file was encountered, but to otherwise allow the process to proceed. Basically, an empty linkset file should be allowed and proceed with the load.
Can you justify why we should allow loading an empty linkset file?
It is a deliberate quality assurance step that the loader performs these test.
We were thinking about automating things more, and in an automated process it's easy enough to have a data set where there are no links... they found three such link sets right now... is an empty link set semantically wrong?
We took the view that it is wrong. You have metadata saying there are links between these two data sources but then there are no links.
As such, we wanted the automated loading process to fail so that we would have to fix the issue rather than just seeing a bunch of warnings that are easy to ignore and pass over.
Agree with @AlasdairGray that a linkset with no links is not really a linkset (and so should not declare itself as that), and a warning would be easily ignored (as IMS has loads of them) - and so you would no longer detect the probably more common case that the void declares the wrong void:linkPredicate
.
But perhaps it could be allowed if the void also says void:triples 0
?
I'm in favor of allowing empty sets. I don't think the IMS loading is the right place or time to complain about this. The IMS function does not break after loading an empty set (which is still a set). The predicate cannot be wrong either, with zero links, so an exception for that sounds semantically wrong to me.
I see a link set as the result of a calculation (process) to determine the links. In fact, for many of our link sets these are in fact calculated. For others, the process is an analysis of an external data set...
An explicit claim that given that process that there are zero links makes a lot of sense to me. Without any VoID we also loose the observation that there were no links.
I think your final point is more about the outcome of the linking process rather than the use of the linksets.
I would argue that from the use of linksets, having a set with no links doesn't make sense and will have an impact on the performance of the system, as we could easily generate thousands of linksets with no links, e.g. UniProt:Protein to aers:AdverseEvent.
I don't think performance will be an issue, but there might be an issue if the empty linkset causes a false transitive path possibility. Not tested!
But we can just add an --ignore-empty
flag then and see how it goes? To the command line or loader XML file?
I was not referring the performance of the loader, but of the computation of links.
@egonw can you give a concrete use case where you have a linkset with no links that should be loaded into the IMS?
@AlasdairGray, Randy/Ian are trying to finish the OPS 2.2 API release and running into empty link sets. Since many link sets are autogenerated, as part of a update release flow, some end up empty. See the aforementioned example by Randy...
From the data governance perspective, I think it is better that we weed out the empty linksets as part of the loading process. If they are truely meant to be empty then we don't need to load them, they would generate no query answers. If they are not meant to be empty then we need to find out why they are empty.
@randykerber, what command line call did you make on the above (empty) link set that we can use to reproduce the exception? (And thus, test the --ignore-empty option to be implemented...)
The load command line would have used a load.xml
that loaded all openphacts linksets (more than 100). I'm guessing you want something more concise?
The second entry in this issue has a complete linkset file that triggered this exception. I can fetch the void file that goes along with this. That enough?
I've attached the void file (below) that references the zero-link linkset listed in the second entry of this 'issues' page.
I had to rename the file to void_2017-11-10.ttl.txt
, else GitHub would not accept it. Should be named void_2017-11-10.ttl
.
Stacktrace:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.simontuffs.onejar.Boot.run(Boot.java:340)
at com.simontuffs.onejar.Boot.main(Boot.java:166)
Caused by: org.bridgedb.utils.BridgeDBException: Error loading
at uk.ac.manchester.cs.openphacts.ims.loader.RunLoader.main(RunLoader.java:241)
... 6 more
Caused by: org.bridgedb.utils.BridgeDBException: Found 0 where 1 expected
at uk.ac.manchester.cs.openphacts.ims.loader.LinksetLoader.getObjectURI(LinksetLoader.java:169)
at uk.ac.manchester.cs.openphacts.ims.loader.LinksetLoader.getParser(LinksetLoader.java:132)
at uk.ac.manchester.cs.openphacts.ims.loader.LinksetLoader.load(LinksetLoader.java:91)
at uk.ac.manchester.cs.openphacts.ims.loader.LinksetLoader.load(LinksetLoader.java:69)
at uk.ac.manchester.cs.openphacts.ims.loader.RunLoader.loadLinkset(RunLoader.java:102)
at uk.ac.manchester.cs.openphacts.ims.loader.RunLoader.main(RunLoader.java:225)
... 6 more
OK, turns out that to test this, I need to bake at least a new .war and possibly a new Docker... I am working on this, but Tina will likely want to make a new PathVisio release and Anders a new pvjs release, so pushing this to the next release... which will, in all fairness, follow really soon... Oh, and Jonathan is waiting for a release too... #rsro
Currently, when attempt to load a linkset file into IMS, if the linkset file is empty an Exception is thrown and the load terminates. emty means there are zero links in the linkset file. Here is contents of such a file: