apache / bookkeeper

Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
https://bookkeeper.apache.org/
Apache License 2.0
1.9k stars 904 forks source link

Introduce Offset Index #1376

Open sijie opened 6 years ago

sijie commented 6 years ago

FEATURE REQUEST

  1. Please describe the feature you are requesting.

Currently in a ledger, we indexed entries by entry id. It would be good to have an index by offsets. This allows supporting APIs like:

  1. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?

nice-to-have

  1. Provide any additional detail on your proposed use case for this feature.

entry(/request) oriented api is not very good friendly to resource-usage when do prefetching or batching reads. offset oriented api is much better for estimating resource usage.

eolivelli commented 6 years ago

It would be great for stagare usecase, like reading a batch of sequential entries woth random access pattern (opposite to tailing reads)

eolivelli commented 6 years ago

A new wire protocol rpc would great as well

Tielem commented 5 years ago

FEATURE REQUEST

  1. Please describe the feature you are requesting.

We create continuous streams of growing data, eg ledgers with entities. However, we also require random access in the underlying data stream.

Proposed API's:

Returns the byte offset of the last byte of the last entry committed to the ledger.

Reads all bytes between startOffset and endOffset (inclusive), returned per stored entry. In case the endOffset is beyond the end of the ledger, the behavior should be the same as readEntries.

Reads all bytes from startOffset to current confirmed end of ledger, returned per stored entry.

  1. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?

Currently using a different solution to tackle this use case, with less durability/scalability/etc. It would make our future architecture simpler and better overall.

  1. Provide any additional detail on your proposed use case for this feature.

While not needed for our use-case, to be API complete, uncommitted API's might be good to have.

While not immediately needed for our use-case and we can tackle this with other polling mechanisms, it would be useful if we can open read binary from a ledger. Starting at a given startOffset, keep receiving byte[] until either the handler is closed or endOffset is met.