mhoangvslev commented 2 months ago

This could be an use-case for RAW-JENA where it can help reduce (massively) the workload generation time.

mhoangvslev commented 2 months ago

Problem:

Query 5 doesn't give results.
When execute on Virtuoso using explicit join order + limit 5, there are results.
The triples [whatever] owl:sameAs ?prodFeature. make q05 more selective.
How many RW does it take to hit the first row?

Steps to reproduce results

Launch raw-jena + ui using fedup-id summary (endpoint is: http://localhost:3330/fedup-id)

Input this query:


PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT DISTINCT ?ProductXYZ WHERE { GRAPH ?g1 { ?localProduct rdfs:label ?localProductLabel; bsbm:productFeature ?localProdFeature; bsbm:productPropertyNumeric1 ?simProperty1; bsbm:productPropertyNumeric2 ?simProperty2; owl:sameAs ?product. ?localProdFeature owl:sameAs ?prodFeature. } GRAPH ?g2 { ?localProductXYZ bsbm:productFeature ?localProdFeatureXYZ; bsbm:productPropertyNumeric1 ?origProperty1; bsbm:productPropertyNumeric2 ?origProperty2; owl:sameAs ?ProductXYZ. ?localProdFeatureXYZ owl:sameAs ?prodFeature. } FILTER((?simProperty1 < (?origProperty1 + 20 )) && (?simProperty1 > (?origProperty1 - 20 ))) FILTER((?simProperty2 < (?origProperty2 + 70 )) && (?simProperty2 > (?origProperty2 - 70 ))) }



# Other

Chat-Wane commented 2 months ago

This could be an use-case for RAW-JENA where it can help reduce (massively) the workload generation time.

To provide more context: to instanciate a templated query, you sometimes need actual values from the dataset. I assume that it does not take so long after the dataset generation (since you may still have all information in memory?). But then, assuming you want more, and the dataset is already ingested, you may use random walks (and why not web preemption?) to provide random values.

Query 5 doesn't give results.

This would need some testing (with smaller queries). Possible culprits:

Is it because the query is too selective with these filters and random have hard time finding any value matching these ?
You use GRAPH ?g1 { tp1 . tp2 } but I am not sure it's well handled since it's a QuadBlock. I would advise a test by breaking down these blocks into small quad patterns, i.e., GRAPH ?g1 { tp1 } . GRAPH ?g1 { tp2 }.

GDD-Nantes / FedShop

Value selection using Random Sampling #70

Problem:

Steps to reproduce results