Open li-boxuan opened 1 year ago
In this case I don't know how to determine that we finished reading data from Amazon Keyspace (at least I'm not sure how to do that with Datastax Java Driver).
I would imagine that we can improve CQLPagingIterator with providing a function which builds a new statement with the last page state (i.e. currentResultSet.getExecutionInfo().getPagingState()
). That said, if all those statements return nothing then it's not clear that we finished or not. Thus, it will lead to infinite loop.
I guess as a workaround we could store the last used page state and the current page state. If they are equal then it means we reached the end (probably).
That means we will always have a single redundant call at the end to verify that we really reached the end.
Not sure how good this solution is. Also, I guess it could really be implemented on DataStax driver side.
@li-boxuan will that be a problem also for a SparkComputer iterating over all edges for example ? I'm not sure if it's doing iteration differently?
@hexa00
That won't affect SparkGraphComputer. Spark traversal uses CqlRecordReader to load the data.
Some context on the datastax side, the issue is here
@Override
protected Row computeNext() {
maybeMoveToNextPage();
return currentRows.hasNext() ? currentRows.next() : endOfData();
}
where it advances the page then ends the iterator if the next page has no data, regardless of if theres another page.
I'm surprised no one at AWS ran into this issue as this is their officially recommended java driver.
The issue could potentially be fixed in https://github.com/JanusGraph/janusgraph/pull/3760 because the pagination approach is changed in that PR and we don’t use sync pagination anymore. I assume after that PR is merged the issue could be potentially fixed, but I personally didn’t use Amazon Keyspaces, so can’t tell for sure. Anyone who is interested in Amazon Keyspaces is free to try it afterwards.
Unfortunately, Amazon Keyspace diverges from Apache Cassandra when it comes to paging.
Due to the way Amazon Keyspace works, even if a page iterator returns empty results, there still can be a valid next page with data. This happens when
ALLOW FILTERING
is used. In JanusGraph, this means when a full scan is triggered. We should fix CQLPagingIterator in CQLKeyColumnValueStore.java.At the moment, I am not 100% sure whether this would also require a fix from the DataStax java driver which implements paging internally as well.
C.C. @porunov since you did quite a bit of the CQLPagingIterator implementation.
References: https://stackoverflow.com/questions/69940435/amazon-keyspaces-allow-filtering-without-all-keys-in-node-js https://docs.aws.amazon.com/keyspaces/latest/devguide/working-with-queries.html#paginating-results https://github.com/JanusGraph/janusgraph/discussions/3378#discussioncomment-4405659