apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.41k stars 3.68k forks source link

severe performance issue due to lock in StringDimensionIndexer.DimensionDictionary #6322

Open kaijianding opened 6 years ago

kaijianding commented 6 years ago

Update: the master branch code uses ReentrantReadWriteLock instead of synchronized, the r/w competitive issue should have been gone, but after another test run, I got extremely high CPU occupy and no noticeable query time reduction.

=====================> In my case, it has 20k/s inserts to each realtime node, and 200/s small groupby queries(only query the latest 2 minutes data), then I noticed severe performance issue:300~400ms per query on realtime node.

the jstack log shows it is waiting for lock

public String getValue(int id)
{
  synchronized (lock) {
    return Strings.emptyToNull(idToValue.get(id));
  }
}

java.lang.Thread.State: BLOCKED (on object monitor) at io.druid.segment.StringDimensionIndexer$DimensionDictionary.getValue(StringDimensionIndexer.java:88)

  • waiting to lock <0x0000000736400000> (a java.lang.Object) at io.druid.segment.StringDimensionIndexer.getActualValue(StringDimensionIndexer.java:669) at io.druid.segment.StringDimensionIndexer.access$000(StringDimensionIndexer.java:59) at io.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector.lookupName(StringDimensionIndexer.java:506) at io.druid.segment.DimensionSelectorUtils.makePredicateMatchingSet(DimensionSelectorUtils.java:243) at io.druid.segment.StringDimensionIndexer$1IndexerDimensionSelector.makeValueMatcher(StringDimensionIndexer.java:461) at io.druid.query.filter.StringValueMatcherColumnSelectorStrategy.makeValueMatcher(StringValueMatcherColumnSelectorStrategy.java:61) at io.druid.query.filter.StringValueMatcherColumnSelectorStrategy.makeValueMatcher(StringValueMatcherColumnSelectorStrategy.java:28) at io.druid.segment.filter.Filters.makeValueMatcher(Filters.java:176) at io.druid.segment.filter.LikeFilter.makeMatcher(LikeFilter.java:73) at io.druid.segment.incremental.IncrementalIndexStorageAdapter.makeFilterMatcher(IncrementalIndexStorageAdapter.java:642) at io.druid.segment.incremental.IncrementalIndexStorageAdapter.access$000(IncrementalIndexStorageAdapter.java:68) at io.druid.segment.incremental.IncrementalIndexStorageAdapter$1$1.(IncrementalIndexStorageAdapter.java:244) at io.druid.segment.incremental.IncrementalIndexStorageAdapter$1.apply(IncrementalIndexStorageAdapter.java:242) at io.druid.segment.incremental.IncrementalIndexStorageAdapter$1.apply(IncrementalIndexStorageAdapter.java:234)

DimensionDictionary.getValue() is called in many places(compareUnsortedEncodedKeyComponents in IncrementalIndex, makeValueMatcher in StorageAdapter, etc), every call has to be queued due to this lock.

To my knowledge, I think this lock is not needed any more.

  1. the add() method in Dictionary is only called in single thread in IncrementalIndex
  2. read consistency is promised by #4049
  3. sortedLookup() should only be called when persist when the IncrementalIndex is stopped to accept new row which mean the Dictionary is readonly at that moment

if we can remove this lock in Dictionary, performance should be greatly improved. @gianm @leventov do you have any idea why we should keep this lock? Do you have any idea how we can improve this lock to avoid unnecessary queue up?

gianm commented 6 years ago

@kaijianding did you close this because it's not really an issue? Just wondering.

kaijianding commented 6 years ago

No, it is still an issue. I just checked the code in master branch, I found it uses the ReentrantReadWriteLock instead of synchronized. I thought the problem should have been gone, but it wasn't. I tested it in my scenario again, I got extremely high CPU usage. So I reopen this issue again with some modification to the description. @gianm

gianm commented 6 years ago

Thanks - let's figure out what we want to do about this.

Could you tell if, after upgrading, you were still seeing a lot of blocking on lock of the reentrant r/w lock? Maybe the high write rate is causing readers to block too much.

kaijianding commented 6 years ago

@gianm I saw lots of time eaten up by r/w lock itself, I doubt r/w lock is the root cost of high cpu occupy. To my knowledge the read lock is not needed, we can only keep the lock for add() and sort(), even the lock for sort() is also not needed due to sort() should only be called when persist at which moment the add() is not going to be called. but for safety, we can keep the lock for sort().

leventov commented 6 years ago

It is possible to improve concurrency here, I'm not sure that #4049 is relevant, but principally because indeed it's a case of single writer - multiple readers on an ArrayList + idForNull.

But just removing synchronization is not good because ArrayList data structure itself is racy. When a writer performs ArrayList.add() that reallocates underlying array a reader may intercept and read null for some old id, or get an IndexOutOfBoundsException. ArrayList should be replaced with a manually crafted ArrayList-like data structure, probably making use of CAS and CoW principles, that avoids this type of race.

kaijianding commented 6 years ago

@leventov I removed synchronized from read operation, and it worked fine. because there is only add operation to map and list, no delete operation, so IndexOutOfBoundsException is unlikely happen. And for getValue(id), the id is impossible be greater than ArrayList.size() -1, one should call add() first to get the maxId, also IndexOutOfBoundsException won't happen here.

On the other hand, current synchronized/rw lock can't actually do what it means to. like:

  1. threadA getMinValue(), threadB add(), threadA now holds the outdated minValue.

As you can see, the lock behavior doesn't help in overall perspective.

to remove synchronized to gain performance improve, I removed idForNull and fallback to 0.12.3 code.

private static class DimensionDictionary

{

private String minValue = null;

private String maxValue = null;

private final Object2IntMap<String> valueToId = new Object2IntOpenHashMap<>();

private final List<String> idToValue = Lists.newArrayList();
private final Object lock;

public DimensionDictionary()
{
  this.lock = new Object();
  valueToId.defaultReturnValue(-1);
}

public int getId(String value)
{
  return valueToId.getInt(Strings.nullToEmpty(value));
}

public String getValue(int id)
{
  return Strings.emptyToNull(idToValue.get(id));
}

public int size()
{
  return idToValue.size();
}

public int add(String originalValue)
{
  String value = Strings.nullToEmpty(originalValue);
  synchronized (lock) {
    int prev = valueToId.getInt(value);
    if (prev >= 0) {
      return prev;
    }
    final int index = size();
    valueToId.put(value, index);
    idToValue.add(value);
    minValue = minValue == null || minValue.compareTo(value) > 0 ? value : minValue;
    maxValue = maxValue == null || maxValue.compareTo(value) < 0 ? value : maxValue;
    return index;
  }
}

public String getMinValue()
{
  return minValue;
}

public String getMaxValue()
{
  return maxValue;
}

public SortedDimensionDictionary sort()
{
  synchronized (lock) {
    return new SortedDimensionDictionary(idToValue, size());
  }
}

}

leventov commented 6 years ago

Ok, now I see how #4049 is relevant.

Two things need to be done: 1) Highlight, by adding comments, that StringDimensionIndexers are updated before indexIncrement (which is better renamed to something like lastRowIndex) is incremented. (To ensure this is not violated during some refactoring.) 2) Simple ArrayList is still not enough. This line in the ArrayList's grow() method:

elementData = Arrays.copyOf(elementData, newCapacity);

Lacks a write barrier between Arrays.copyOf() and the assignment. It means that a parallel thread may read the new array from the elementData field before seeing effects of Arrays.copyOf(). This could be fixed by creating a minimal implementation of ArrayList with volatile elementData (without any CAS or CoW logic that I mentioned earlier).

kaijianding commented 6 years ago

@leventov for the ArrayList part, I don't think it is a problem, because size++ happens after grow(). In another word, the size() is always smaller than or equal to elementData size. other methods should only use index which is smaller than size(), the growing part is beyond current size() and should not be visible to other method calls.

leventov commented 6 years ago

@kaijianding this is not a matter of the size variable increment, this is a matter of elementData array update. See this thread: http://cs.oswego.edu/pipermail/concurrency-interest/2018-September/016526.html (ArrayList could have easily avoided this race, but currently it doesn't.)

stale[bot] commented 5 years ago

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

jasonk000 commented 2 years ago

https://github.com/apache/druid/pull/12109 and #12105 (.. should improve this)

github-actions[bot] commented 1 year ago

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.