apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.43k stars 3.69k forks source link

Interning in SQLMetadataSegmentManager may obliterate new segment metadata #6358

Open leventov opened 6 years ago

leventov commented 6 years ago

DataSegment interning in SQLMetadataSegmentManager.poll() is based on segment ids (if a segment with a matching segment id already exists on the heap, it is used instead of the newly polled segment), that may obliterate some new information about segments, such as different loadSpecs.

surekhasaharan commented 5 years ago

@leventov checking my understanding for this fix, so the equals in DataSegment should be either removed or modified to include not just the SegmentId equality, but other fields of DataSegment ? And the replaceWithExistingSegmentIfPresent in SQLMetadataSegmentManager might return something like

return alreadyExistingSegment != null
           ? (alreadyExistingSegment.equals(segment) ? alreadyExistingSegment : segment)
           : segment;
leventov commented 5 years ago

I'm not sure what would be the best way to handle this in terms of both usability and error-proneness. If you just make equals() more expensive and somebody puts it in HashSet as keys, a lot of unnecessary work may be done in the result. On the other hand, there may be cases when we would still want to have DataSegment as keys in Maps to "pack" more data into a Map without creating extra wrapping objects, that is, having ObjToIntMap<DataSegment> instead of Map<SegmentId, SomeWrapperWithDataSegmentAndIntFields>.

So what I would do is:

drcrallen commented 5 years ago

I thought a segment ID was supposed to be immutable. Why would a new load spec not cause a new segment ID?

gianm commented 5 years ago

If you are migrating your segments from one deep storage to another, you might want to adjust things so the same segment objects have new loadSpecs but retain their IDs. It's valuable because changing the IDs implies changing the version and/or partition numbers, which would cause a lot of churn in the cluster.

Segment metadata also evolves naturally over time as segments are pushed from realtime tasks to deep storage. While they're on realtime tasks, they don't yet have loadSpecs or sizes (maybe other stuff too, I don't remember).

surekhasaharan commented 5 years ago

Add a Structural Search inspection in intelliJ which inhibits using DataSegment in Map keys and Sets, but can still be done using @SuppressWarnings("SSBasedInspection") when somebody really needs that (see the example above).

Did you mean to add this structural search inspection as a warning, if I add it, it's manifested as an error, and I'm not sure how to make it a warning while keeping the existing inspections as error. See the last item in this screenshot, how can we set it's severity to warning ? Screen Shot 2019-05-06 at 5 19 57 PM

leventov commented 5 years ago

@surekhasaharan I meant to add it as an error. So every time a developer wants to have Set<DataSegment> in their code, they must put @SuppressWarnings (with justification) to pass CI.

github-actions[bot] commented 1 year ago

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.