ge-semtk / semtk

Drag and drop SPARQL queries and data ingestion for virtuoso and soon other SPARQL endpoints
http://semtk.research.ge.com
Other
37 stars 6 forks source link

ConnectedDataConstructor gives num triples < limit despite truncating #551

Open weisenje opened 11 months ago

weisenje commented 11 months ago

Summary: SparqlToXLibUtil.generateConstructConnected() was intended to guarantee that if data is truncated, then number of triples returned will be >= limit. Consuming processes count on this to know whether the data may be truncated. However, we are seeing data truncated and the number of triples below the limit.

Details/Example: In SPARQLgraph, node expansion uses ADD_TRIPLES_MAX = 1000. This number appears as the “limit” argument to generateConstructConnected(), which incorrectly assumes 3 triples per line and thus generates SPARQL with LIMIT 334, which truncates the data. In the case observed, we get 2 triples per line (the connected instance type and connecting predicate), for a total of 667 triples. Since 667 is under the max of 1000, SPARQLgraph does not give the "using a subset" message.

Note: For connected literals (vs URIs), may only see 1 triple per line. Note: The “3 triples per line” may have come from a use case where there was a predicate in extraPredicatesList.

weisenje commented 11 months ago

To reproduce: