apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.95k stars 979 forks source link

DRILL-8495: Tried to remove unmanaged buffer #2913

Closed rymarm closed 6 months ago

rymarm commented 6 months ago

DRILL-8495: Tried to remove unmanaged buffer

The root cause of the issue is that multiple HiveWriters use the same DrillBuf and during execution they may reallocate the buffer if size of the buffer is not enough for a value (256 bytes+). Since drillBuf.reallocIfNeeded(int size) returns a new instance of DrillBuf, all other writers still have a reference for the old one buffer, which after drillBuf.reallocIfNeeded(int size) execution is unmanaged now.

Description

HiveValueWriterFactory now creates a unique DrillBif for each writer.

HiveWriters are actually used one-by-one and we could utilize a single buffer for all the writers. To do this, I could create a class holder for DrillBuf, so each writer has a reference for the same holder, where will be stored a new buffer from every drillBuf.reallocIfNeeded(int size) call. But I thought that such logic looked slightly confusing and I decided just to let each HiveWriter use its own buffer.

Documentation

-

Testing

Add a new unit test to query a Hive table with variable-length values of Binary, VarChar, Char and String types.

jnturton commented 6 months ago

P.S. I see that checkstyle is still upset.

rymarm commented 6 months ago

@jnturton I addressed checkstyle issues and failed java tests. Should be fine now)