Closed ambud closed 7 years ago
The problem with this approach is that it makes a dependency to some other stream than the stored bytes, thus they're not self contained anymore. Now you would need to store for the future reference two things: the stream and the metadata associated with the stream.
An approach to this metadata could be storing the amount of datapoints in the actual stream. I store some sort of metadata in the Hawkular by using this approach (first byte is dedicated to the choosing of the compression format).
That's correct, but considering TSC is just a building block for a TS Storage platform, it let's the user pick how they would like to store the metadata.
The issue with current block implementation is, concurrent read/writes can't be supported since the reader has no idea how many Pairs have been written at any given point in time. The block stream must be closed before it can be read.
Please feel free to close the PR without merging.
I don't think with this approach you should necessarily use concurrent writers to the same stream, the syncing of multiple writers/readers will take more time than using a single writer. Not to mention, it's pretty uncommon use-case to receive points from multiple different writers to the same series often.
However, I agree with the fact that reading while still writing the open stream should be handled. Although that part could be handled by keeping the "count" outside this. Say for example you would have an object with the following properties:
Count Decompressor (Stream) Compressor (Stream)
Encapsulate the "addValue" to this object and then update the Count and for Decompressing you would at most call readPair "count"-times.
Thus, you could still store the "ready blocks" to some slower media, such as S3 without the need of yet-another-metadata-store.
I'd like to keep this PR open however, I agree this isn't a perfect solution and I want to think about other approaches also (I'm doing the development in the another branch currently to prototype certain performance scenarios).
@burmanm here's a patch for removing end of stream marker by using a count instead. Number of messages written should be stored by the user somewhere which can be referenced when reading.
Have also modified the unit tests accordingly. This for the issue #5