GDD-Nantes / FedShop

Code for FedShop: The Federated Shop Benchmark
GNU General Public License v3.0
8 stars 0 forks source link

ANAPSID return AttributeError #54

Open Yotlan opened 1 year ago

Yotlan commented 1 year ago

What we want ?

Sometime, ANAPSID return an AttributeError when we launch some queries like q11, and we want to avoid this error who can appear when launching some queries.

What happens ?

Like we said earlier, in the case we have some unwanted operator, ANAPSID return an AttributeError.

Where ?

In method includePhysicalOperatorJoin in Plan.py, there is the following condition:

        #if (noInstantiatedRightStar) or ((not wc) and (l.constantPercentage() >= 0.5) and (len(join_variables) > 0) and c):
        # Case 1: left operator is highly selective and right operator is low selective
        if not(lowSelectivityLeft) and lowSelectivityRight  and not(isinstance(r, TreePlan)):
            n = TreePlan(NestedHashJoin(join_variables), all_variables, l, r)
            dependent_join = True
            #print "Planner CASE 1: nested loop", type(r)
        # Case 2: left operator is low selective and right operator is highly selective
        elif lowSelectivityLeft and not(lowSelectivityRight) and not(isinstance(l, TreePlan)):
            n = TreePlan(NestedHashJoin(join_variables), all_variables, r, l)
            dependent_join = True
            #print "Planner CASE 2: nested loop swapping plan", type(r)
        elif not(lowSelectivityLeft) and lowSelectivityRight  and (not(isinstance(l, TreePlan)) or not(l.operator.__class__.__name__ == "NestedHashJoinFilter" )) and (not(isinstance(r,TreePlan)) or not(r.operator.__class__.__name__ == "Xgjoin" or r.operator.__class__.__name__ == "NestedHashJoinFilter")):
            if (isinstance(r,TreePlan) and (set(l.vars) & set(r.operator.vars_left) !=set([])) and (set(l.vars) & set(r.operator.vars_right) !=set([]))):
                n = TreePlan(NestedHashJoin(join_variables), all_variables, l, r)
                dependent_join = True
            elif (isinstance(l,TreePlan) and (set(r.vars)& set(l.operator.vars_left) !=set([])) and   (set(r.vars)& set(l.operator.vars_right) !=set([]))):
               n = TreePlan(NestedHashJoin(join_variables), all_variables, l, r)
               dependent_join = True
            else:
               n =  TreePlan(Xgjoin(join_variables), all_variables, l, r)
            #print "Planner case 2.5", type(r)
        # Case 3: both operators are low selective
        else:
            n = TreePlan(Xgjoin(join_variables), all_variables)

In the second elif of this conditional statement, we can have some operator like HashJoin. But HashJoin have no vars_left and should not go in this statement, but in the last case. Moreover, we can have IndependantOperator who should never go in this statement because ANAPSID treat his case after this huge conditional statement.

How to reproduce ?

You can launch a random q11's queries like for example the following query:

SELECT DISTINCT ?property ?hasValue ?isValueOf WHERE {
    <http://www.vendor6.fr/Offer886> <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product> ?product . 
    { <http://www.vendor6.fr/Offer886> ?property ?hasValue }
    UNION
    { ?isValueOf ?property <http://www.vendor6.fr/Offer886> }
}

And you'll see the AttributeError.

Yotlan commented 1 year ago

Proposition of solution

To avoid this error, we can simply add another condition to the following conditional statement:

elif not(lowSelectivityLeft) and lowSelectivityRight  and (not(isinstance(l, TreePlan)) or not(l.operator.__class__.__name__ == "NestedHashJoinFilter" )) and (not(isinstance(r,TreePlan)) or not(r.operator.__class__.__name__ == "Xgjoin" or r.operator.__class__.__name__ == "NestedHashJoinFilter")):

For example for the HashJoin operator who should not going in this conditional statement, we can add these conditions:

not(l.operator.__class__.__name__ == "HashJoin") and not(r.operator.__class__.__name__ == "HashJoin")

To each of the left and right operator. In the case of HashJoin, if we add these condition to the conditional statement, all the queries who return an AttributeError, timeout (because ANAPSID do his work and construct all the join and merge intermediate result).

Yotlan commented 1 year ago

Another proposition of solution

To avoid this error when we have an AttributeError related to IndependantOperator, it's important to note that IndependantOperator not have operator members. So we need to add in the first conditional statement who was the following conditional statement:

if not(lowSelectivityLeft) and lowSelectivityRight  and not(isinstance(r, TreePlan)):

The following condition:

not(l.__class__.__name__ == "IndependantOperator") and not(r.__class__.__name__ == "IndependantOperator")
mhoangvslev commented 1 year ago

In this query:

SELECT DISTINCT ?property ?hasValue ?isValueOf WHERE {
    <http://www.vendor6.fr/Offer886> <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product> ?product . 
    { <http://www.vendor6.fr/Offer886> ?property ?hasValue }
    UNION
    { ?isValueOf ?property <http://www.vendor6.fr/Offer886> }
}

the first triple pattern was a by-product of the query instantiation process and should not be there after the ?offer variable has been injected. This triple asks for all ?product that is offered by Offer886, and since ?product is not a join variable, it will produce a Cartesian product with other tps.

Can you try removing this tp in your test query and see if it still work?

Yotlan commented 1 year ago

When we launch this query (the q11's query without the first triple) this query work.

SELECT DISTINCT ?property ?hasValue ?isValueOf WHERE {
    { <http://www.vendor6.fr/Offer886> ?property ?hasValue }
    UNION
    { ?isValueOf ?property <http://www.vendor6.fr/Offer886> }
}

And give the following results:

{'property': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'hasValue': 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/Offer', 'isValueOf': ''}{'property': 'http://www.w3.org/2002/07/owl#sameAs', 'hasValue': 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/Offer886', 'isValueOf': ''}{'property': 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/deliveryDays', 'hasValue': '4^^<http://www.w3.org/2001/XMLSchema#integer>', 'isValueOf': ''}{'property': 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/offerWebpage', 'hasValue': "entomology Heliopolis comportment's rosebushes twentieth's Reba Americanization's poetesses Shintos", 'isValueOf': ''}{'property': 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/price', 'hasValue': '4065.84^^<http://www.w3.org/2001/XMLSchema#double>', 'isValueOf': ''}{'property': 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product', 'hasValue': 'http://www.vendor6.fr/Product55489', 'isValueOf': ''}{'property': 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/publishDate', 'hasValue': '2008-02-24T00:00:00^^<http://www.w3.org/2001/XMLSchema#dateTime>', 'isValueOf': ''}{'property': 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/validFrom', 'hasValue': '2008-02-20T00:00:00^^<http://www.w3.org/2001/XMLSchema#dateTime>', 'isValueOf': ''}{'property': 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/validTo', 'hasValue': '2008-05-31T00:00:00^^<http://www.w3.org/2001/XMLSchema#dateTime>', 'isValueOf': ''}{'property': 'http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/vendor', 'hasValue': 'http://www.vendor6.fr/Vendor0', 'isValueOf': ''}
mhoangvslev commented 1 year ago

I have another hypothesis: other engines returned results for this query, so I think this also reveals a weakness in ANAPSID where they could not handle Cartesian Product.

Could you test with this query?

SELECT * WHERE {
    <http://www.vendor6.fr/Offer886> <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product> ?product1 . 
    <http://www.vendor6.fr/Offer886> <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product> ?product2 . 
}
Yotlan commented 1 year ago

This query return the following results:

{'product1': 'http://www.vendor6.fr/Product55489', 'product2': 'http://www.vendor6.fr/Product55489'}
mhoangvslev commented 1 year ago

It's not the Cartesian Product the problem, I have tried to remove the UNION structure and it works. You can run with the rest of the workload without q11 for now.