JervenBolleman / void-generator

Calculate statistics for use in a Service Description or Void file
MIT License
5 stars 3 forks source link

Possible inappropriate use of the class exclusion filter #21

Open galgonek opened 1 week ago

galgonek commented 1 week ago

The new version (336baa1e7dbae3e4e8407f7a3c1ff3ad63267675) of the generator employs queries that IDSM may not evaluate in a reasonable time. The example of such a query follows:

SELECT DISTINCT ?clazz ?linkingGraphName
WHERE {
        GRAPH <http://rdf.ncbi.nlm.nih.gov/pubchem/patent> {
                ?subject a <http://data.epo.org/linked-data/def/patent/Publication>
        }
        GRAPH ?linkingGraphName {
                ?subject <http://purl.org/spar/cito/isDiscussedBy> ?target
        }
        GRAPH <http://rdf.ncbi.nlm.nih.gov/pubchem/protein> {
                ?target a ?clazz
        }
VALUES (?clazz) {(<http://purl.obolibrary.org/obo/GO_0032991>) (<http://purl.uniprot.org/core/Enzyme>) (<http://semanticscience.org/resource/SIO_010043>) (<http://semanticscience.org/resource/SIO_010343>) (<http://www.biopax.org/release/biopax-level3.owl#Protein>)}
   FILTER(!strstarts(str(?clazz), "http://purl.obolibrary.org/obo/CHEMONTID_") && !strstarts(str(?clazz), "http://purl.bioontology.org/ontology/NDFRT/") && !strstarts(str(?clazz), "http://purl.bioontology.org/ontology/SNOMEDCT/") && !strstarts(str(?clazz), "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#") && !strstarts(str(?clazz), "http://purl.obolibrary.org/obo/UBERON_") && !strstarts(str(?clazz), "http://purl.obolibrary.org/obo/PR_") && !strstarts(str(?clazz), "http://purl.obolibrary.org/obo/CHEBI_"))
}

These queries are generated by method makeQuery of class IsSourceClassLinkedToDistinctClassInOtherGraph. However, I think that these queries are unnecessarily complicated.

If the list of classes (otherGraph.getClasses()) is provided, why is the class filter (classExclusion) used? Does not the list contain only those classes that already match the filter?

If the list of classes is empty, is there a reason to run the query? Does not the blank list mean that no class in the graph (otherGraph) matches the filter?

JervenBolleman commented 1 week ago

The class list may be empty if this set of query is being run before we have detected which classes are in the other graph. But yes the filter exclusion is not needed if we already know the list of graphs on the other side.

An other option might be to introduce some other set of phasors/locks to make sure we know this before these queries are run.

I also modified the query to do the count at the same time instead of in two steps.