crawler-commons / url-frontier

API definition, resources and reference implementation of URL Frontiers
Apache License 2.0
44 stars 11 forks source link

Pagination of ListQueues #26

Closed jnioche closed 3 years ago

jnioche commented 3 years ago

ListQueues can lead to gRPC error when the number of queues is large

java -jar ./target/urlfrontier-client*.jar ListQueues
io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: gRPC message exceeds maximum size 4194304: 10945786
    at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
    at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
    at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
    at crawlercommons.urlfrontier.URLFrontierGrpc$URLFrontierBlockingStub.listQueues(URLFrontierGrpc.java:604)
    at crawlercommons.urlfrontier.client.ListQueues.run(ListQueues.java:52)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
    at picocli.CommandLine.access$1300(CommandLine.java:145)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
    at picocli.CommandLine.execute(CommandLine.java:2078)
    at crawlercommons.urlfrontier.client.Client.main(Client.java:40)
Jun 23, 2021 12:58:07 PM io.grpc.internal.AbstractClientStream$TransportState inboundDataReceived
INFO: Received data on closed stream

Limiting the size of the returned list, e.g. via java -jar ./target/urlfrontier-client*.jar ListQueues -n 1000 avoids the exception. However, since results are apparently returned in a consistent order and the -n only controls the number of items from the start of the list, this makes it difficult for a client to obtain the tail part of the list.

We should paginate the results and return a richer output with: total number of queues, start and end offsets etc...