apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.27k stars 1.23k forks source link

Improving server segment download code #9152

Open jasperjiaguo opened 1 year ago

jasperjiaguo commented 1 year ago
  1. Currently if we use the default code path for server segment download we won't retry download on untar failure. In situations as downloading partial tar file, this would not be helpful. There are two solutions:

    • improve the default code path in BaseTableDataManager
    • use the new stream download untar code path pinot.server.instance.segment.stream.download.untar
  2. The new stream download untar code path currently doesn't support stream decryption, as should be theoretically possible with block cipher. This potential improvement to stream segment download can be explored.

User doc for stream download untar: https://docs.pinot.apache.org/operators/tutorials/performance-optimization-configurations PR: https://github.com/apache/pinot/pull/8753

jasperjiaguo commented 1 year ago

This can potentially be a beginner task

siddharthteotia commented 1 year ago

@jasperjiaguo - if not already, can you please add oss user docs for the new enhancement of stream untar + rate limiting. Include a link here in this issue as well. Might also come handy for whoever picks this up to understand background etc.

heatclub commented 1 year ago

@jasperjiaguo I can get started on this, if it's not yet been picked up.

jasperjiaguo commented 1 year ago

Hey @heatclub it's not yet picked up, do you want to do both? You can take a look at files changed in https://github.com/apache/pinot/pull/8753. 2) is more exploratory as you probably want to find the block cipher scheme that has streamed interface (e.g. https://docs.oracle.com/javase/7/docs/api/javax/crypto/CipherInputStream.html) in java and generalize them (PinotCrypter is a noop rn so we need to foresee what cyphers can be used).

heatclub commented 1 year ago

Yes I can pick up both the issues, will get back with queries if any

jugomezv commented 1 year ago

I wanted to better understand the goals of this issue which seem two in this case: add support for decryption (item 2) and I am not sure about item 1 above: are we suggesting we make the default code path retry and eliminate the need for configuration? if this is not the case, can we explain better what is what we want to achieve with that first item?.

jasperjiaguo commented 1 year ago

Thanks for the clarification question @jugomezv. For 1) I'm thinking to add retry downloading on untar failed for the old (not streamed) segment download code path. As currently the new code path retries everything while old doesn't. The option for switching two code paths will still be there. Is this what you are asking?

jugomezv commented 1 year ago

Oh I think I get it, your fix is was only for streaming, ok I get it. Yes for sure we need that: will it also be an option in that case?

jasperjiaguo commented 1 year ago

Oh I think I get it, your fix is was only for streaming, ok I get it. Yes for sure we need that: will it also be an option in that case?

Probably behind a flag... set it when the user deems deep store download could be unreliable?

jugomezv commented 1 year ago

@jasperjiaguo yes, makes sense.

heatclub commented 1 year ago

Hi @jasperjiaguo Sorry I was inactive here owing to some personal commitments.

I went through the code and have below findings for 1 Existing flow BaseTableDataManager.downloadSegment() -> downloadSegmentFromDeepStore() -> downloadAndDecrypt() -> downloadFromPeersWithoutStreaming calls these in SegmentFetcherFactory -> fetchAndDecryptSegmentToLocal() -> fetchAndDecryptSegmentToLocalInternal -> fetchSegmentToLocal -> fetchSegmentToLocalInternal -> fetchSegmentToLocal -> RetryPolicies.exponentialBackoffRetryPolicy with fetchSegmentToLocalWithoutRetry() -> **BaseTableDataManager.untarAndMoveSegment()** All download functions have retry policy

New proposed flow BaseTableDataManager.downloadSegment() -> downloadSegmentFromDeepStore() -> downloadAndDecrypt() -> { same function calls as above for SegmentFetcherFactory } -> **BaseTableDataManager.untarAndMoveSegmentWithRetries()**

heatclub commented 1 year ago

@jasperjiaguo can you please take a look at this PR for issue [1]: https://github.com/apache/pinot/pull/10411 I don't seem to have permission to link this issue to PR or add labels.

jasperjiaguo commented 1 year ago

Hey @heatclub sorry for the delayed reply, I was in smth. Yes your understanding is basically correct, we need to retry download upon download failures (which we already do), and retry download-untar upon untar failures. Will review the code this week.

heatclub commented 1 year ago

Thanks for the revert @jasperjiaguo , above PR has received some feedback comments from team members and I am working on the refactoring.