Aiven-Open / tiered-storage-for-apache-kafka

RemoteStorageManager for Apache Kafka® Tiered Storage
Apache License 2.0
95 stars 20 forks source link

fix: fully consume index when transforming #536

Closed jeqo closed 7 months ago

jeqo commented 7 months ago

As it is currently validated, there is a chance indexes are not fully consumed when transforming, as the validation is assuming the whole stream is consumed, when it is not.

final var inputStream = transformFinisher.nextElement();
segmentIndexBuilder.add(indexType, singleChunk(transformFinisher.chunkIndex()).range().size());
return inputStream;

By introducing an output stream to hold the processed content, and passing the buffer to the storage layer we guarantee that the validation is correctly validating that there is only a single chunk processed.

ivanyu commented 7 months ago

Do I understand correctly that the problem is that the index may be longer than nextElement()? Or something else?

jeqo commented 7 months ago

Yes, that was my understanding; however I may have rushed a bit here and actually it's working correctly (though not obvious):

I got confused by the mixture of a outdated past memories. I will create a quick PR to document these less obvious conditions so the future me doesn't get confused again 😅

jeqo commented 7 months ago

Added doc comments as part of https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/pull/535