apache / arrow-cookbook

Apache Arrow Cookbook
https://arrow.apache.org/
Apache License 2.0
95 stars 46 forks source link

[Java] Document how to convert JDBC Adapter result into a Parquet file #316

Open davisusanibar opened 1 year ago

davisusanibar commented 1 year ago

In this example, we have the JDBC adapter result and trying to write them into a parquet file.

Current workaround:

davisusanibar commented 1 year ago

To close https://github.com/apache/arrow-cookbook/issues/315

danepitkin commented 1 year ago

Ah I see the related issue now. I think it would be best if we had "read/write parquet" examples in dataset.rst and then added a very minimal example of why/how to extend the ArrowReader class for JDBC. What do you think?

davisusanibar commented 1 year ago

Ah I see the related issue now. I think it would be best if we had "read/write parquet" examples in dataset.rst and then added a very minimal example of why/how to extend the ArrowReader class for JDBC. What do you think?

That make sense, let me also divide that.

davisusanibar commented 1 year ago

Hi @lidavidm, Are there some recommendation for your side to where I could try to search/review for this issue?

Did you see this error when you were working with DatasetFileWriter.write or some errors related?

Current error messages:

07:52:55.995 [main] INFO org.apache.arrow.memory.BaseAllocator - Debug mode enabled.
07:52:55.999 [main] INFO org.apache.arrow.memory.DefaultAllocationManagerOption - allocation manager type not specified, using netty as the default type
07:52:56.001 [main] INFO org.apache.arrow.memory.CheckAllocator - Using DefaultAllocationManager at memory-netty/13.0.0-SNAPSHOT/arrow-memory-netty-13.0.0-SNAPSHOT.jar!/org/apache/arrow/memory/DefaultAllocationManagerFactory.class
07:52:56.020 [main] DEBUG io.netty.util.internal.logging.InternalLoggerFactory - Using SLF4J as the default logging framework
07:52:56.020 [main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetection.level: simple
07:52:56.020 [main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetection.targetRecords: 4
07:52:56.039 [main] DEBUG io.netty.util.internal.PlatformDependent0 - -Dio.netty.noUnsafe: false
07:52:56.039 [main] DEBUG io.netty.util.internal.PlatformDependent0 - Java version: 11
07:52:56.041 [main] DEBUG io.netty.util.internal.PlatformDependent0 - sun.misc.Unsafe.theUnsafe: available
07:52:56.041 [main] DEBUG io.netty.util.internal.PlatformDependent0 - sun.misc.Unsafe.copyMemory: available
07:52:56.042 [main] DEBUG io.netty.util.internal.PlatformDependent0 - sun.misc.Unsafe.storeFence: available
07:52:56.042 [main] DEBUG io.netty.util.internal.PlatformDependent0 - java.nio.Buffer.address: available
07:52:56.042 [main] DEBUG io.netty.util.internal.PlatformDependent0 - direct buffer constructor: unavailable: Reflective setAccessible(true) disabled
07:52:56.043 [main] DEBUG io.netty.util.internal.PlatformDependent0 - java.nio.Bits.unaligned: available, true
07:52:56.043 [main] DEBUG io.netty.util.internal.PlatformDependent0 - jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable: class io.netty.util.internal.PlatformDependent0$7 cannot access class jdk.internal.misc.Unsafe (in module java.base) because module java.base does not export jdk.internal.misc to unnamed module @d4342c2
07:52:56.044 [main] DEBUG io.netty.util.internal.PlatformDependent0 - java.nio.DirectByteBuffer.<init>(long, {int,long}): unavailable
07:52:56.044 [main] DEBUG io.netty.util.internal.PlatformDependent - sun.misc.Unsafe: available
07:52:56.060 [main] DEBUG io.netty.util.internal.PlatformDependent - maxDirectMemory: 8589934592 bytes (maybe)
07:52:56.060 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.tmpdir: /var/folders/d6/cz55k4qj52b40dmdvfjc_stm0000gn/T (java.io.tmpdir)
07:52:56.060 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.bitMode: 64 (sun.arch.data.model)
07:52:56.061 [main] DEBUG io.netty.util.internal.PlatformDependent - Platform: MacOS
07:52:56.063 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.maxDirectMemory: -1 bytes
07:52:56.063 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.uninitializedArrayAllocationThreshold: -1
07:52:56.063 [main] DEBUG io.netty.util.internal.CleanerJava9 - java.nio.ByteBuffer.cleaner(): available
07:52:56.063 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.noPreferDirect: false
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numHeapArenas: 32
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numDirectArenas: 32
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.pageSize: 8192
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxOrder: 9
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.chunkSize: 4194304
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.smallCacheSize: 256
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.normalCacheSize: 64
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxCachedBufferCapacity: 32768
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.cacheTrimInterval: 8192
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.cacheTrimIntervalMillis: 0
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.useCacheForAllThreads: false
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxCachedByteBuffersPerChunk: 1023
07:52:56.069 [main] DEBUG io.netty.util.internal.InternalThreadLocalMap - -Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
07:52:56.069 [main] DEBUG io.netty.util.internal.InternalThreadLocalMap - -Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
07:52:56.086 [main] DEBUG io.netty.buffer.AbstractByteBuf - -Dio.netty.buffer.checkAccessible: true
07:52:56.086 [main] DEBUG io.netty.buffer.AbstractByteBuf - -Dio.netty.buffer.checkBounds: true
07:52:56.087 [main] DEBUG io.netty.util.ResourceLeakDetectorFactory - Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector@72057ecf
07:52:56.105 [main] DEBUG org.apache.arrow.memory.rounding.DefaultRoundingPolicy - -Dorg.apache.memory.allocator.pageSize: 8192
07:52:56.105 [main] DEBUG org.apache.arrow.memory.rounding.DefaultRoundingPolicy - -Dorg.apache.memory.allocator.maxOrder: 11
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.maxCapacityPerThread: 4096
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.ratio: 8
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.chunkSize: 32
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.blocking: false
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.batchFastThreadLocalOnly: true
07:52:56.630 [main] DEBUG io.netty.util.internal.PlatformDependent - org.jctools-core.MpscChunkedArrayQueue: available
07:52:56.637 [main] DEBUG org.apache.arrow.memory.util.MemoryUtil - Constructor for direct buffer found and made accessible
07:52:56.637 [main] DEBUG org.apache.arrow.memory.util.MemoryUtil - direct buffer constructor: available
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.arrow.memory.util.MemoryUtil (file:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-memory-core/13.0.0-SNAPSHOT/arrow-memory-core-13.0.0-SNAPSHOT.jar) to field java.nio.Buffer.address
WARNING: Please consider reporting this to the maintainers of org.apache.arrow.memory.util.MemoryUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 8, length: 8
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 16, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 24, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 32, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 40, length: 16
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 56, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 12
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 80, length: 32
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 112, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 120, length: 12
07:52:57.642 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 136, length: 1
07:52:57.642 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 144, length: 20
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 8, length: 8
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 16, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 24, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 32, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 40, length: 16
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 56, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 12
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 80, length: 32
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 112, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 120, length: 12
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 136, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 144, length: 20
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 8, length: 4
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 16, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 24, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 32, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 40, length: 8
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 48, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 56, length: 8
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 16
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 80, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 88, length: 8
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 96, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 104, length: 4
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 8, length: 4
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 16, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 24, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 32, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 40, length: 8
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 48, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 56, length: 8
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 16
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 80, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 88, length: 8
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 96, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 104, length: 4
Exception in thread "Thread-8" java.lang.IllegalStateException: RefCnt has gone negative
    at org.apache.arrow.util.Preconditions.checkState(Preconditions.java:458)
    at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:130)
    at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
    at org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117)
    at org.apache.arrow.vector.BaseFixedWidthVector.clear(BaseFixedWidthVector.java:248)
    at org.apache.arrow.vector.BaseFixedWidthVector.close(BaseFixedWidthVector.java:238)
    at org.apache.arrow.util.AutoCloseables.close(AutoCloseables.java:97)
    at org.apache.arrow.vector.VectorSchemaRoot.close(VectorSchemaRoot.java:247)
    at org.apache.arrow.vector.ipc.ArrowReader.close(ArrowReader.java:143)
    at org.apache.arrow.vector.ipc.ArrowReader.close(ArrowReader.java:131)
    at org.apache.arrow.c.ArrayStreamExporter$ExportedArrayStreamPrivateData.close(ArrayStreamExporter.java:97)
    Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
        ... 11 more
    Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
        at org.apache.arrow.util.Preconditions.checkState(Preconditions.java:458)
        at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:130)
        at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
        at org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117)
        at org.apache.arrow.vector.complex.BaseRepeatedValueVector.clear(BaseRepeatedValueVector.java:247)
        at org.apache.arrow.vector.complex.ListVector.clear(ListVector.java:624)
        at org.apache.arrow.vector.BaseValueVector.close(BaseValueVector.java:77)
        ... 5 more
Exception in thread "main" java.lang.IllegalStateException: RefCnt has gone negative
    at org.apache.arrow.util.Preconditions.checkState(Preconditions.java:458)
    at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:130)
    at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
    at org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117)
    at org.apache.arrow.vector.BaseFixedWidthVector.clear(BaseFixedWidthVector.java:248)
    at org.apache.arrow.vector.BaseFixedWidthVector.close(BaseFixedWidthVector.java:238)
    at org.apache.arrow.util.AutoCloseables.close(AutoCloseables.java:97)
    at org.apache.arrow.vector.VectorSchemaRoot.close(VectorSchemaRoot.java:247)
    at org.apache.arrow.vector.ipc.ArrowReader.close(ArrowReader.java:143)
    at org.apache.arrow.vector.ipc.ArrowReader.close(ArrowReader.java:131)
    at dataset.domingo.WriteArrowObjectsToParquet.main(WriteArrowObjectsToParquet.java:70)
    Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
        ... 11 more
    Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
        ... 11 more
    Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
        at org.apache.arrow.util.Preconditions.checkState(Preconditions.java:458)
        at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:130)
        at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
        at org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117)
        at org.apache.arrow.vector.BaseVariableWidthVector.clear(BaseVariableWidthVector.java:270)
        at org.apache.arrow.vector.BaseVariableWidthVector.close(BaseVariableWidthVector.java:261)
        ... 5 more
    Suppressed: java.lang.IllegalStateException: Allocator[allocatorParquetWrite] closed with outstanding buffers allocated (12).
Allocator(allocatorParquetWrite) 0/17746/51748/9223372036854775807 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 12
    ledger[101] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362784444..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[155], address:140388662068096, capacity:128
    ledger[111] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383364210168..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[165], address:140388662068608, capacity:128
    ledger[98] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362357658..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[152], address:140388661985520, capacity:2
    ledger[95] allocator: allocatorParquetWrite), isOwning: , size: , references: 0, life: 183383362099977..183383372675946, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[148], address:140388662035456, capacity:512
    ledger[103] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383363021510..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[157], address:140388662068352, capacity:128
    ledger[102] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362877116..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[156], address:140388662068224, capacity:128
    ledger[100] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362665191..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[154], address:140388662067968, capacity:128
    ledger[104] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383363121777..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[158], address:140388662068480, capacity:128
    ledger[105] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383363250885..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[159], address:140388661985536, capacity:8
    ledger[110] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383364066585..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[164], address:140388661985600, capacity:8
    ledger[99] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362510994..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[153], address:140388662059136, capacity:64
    ledger[96] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362167890..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[149], address:140388662140928, capacity:16384
  reservations: 0

        at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:445)
        at dataset.domingo.WriteArrowObjectsToParquet.main(WriteArrowObjectsToParquet.java:34)
    Suppressed: java.lang.IllegalStateException: Allocator[allocatorJDBC] closed with outstanding buffers allocated (8).
Allocator(allocatorJDBC) 0/49760/99536/9223372036854775807 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 8
    ledger[4] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382320693061..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[13], address:140388662017544, capacity:504
        ArrowBuf[11], address:140388662001664, capacity:16384
        ArrowBuf[12], address:140388662001664, capacity:15880
    ledger[3] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382316692606..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[8], address:140388661993472, capacity:32
        ArrowBuf[9], address:140388661993472, capacity:24
        ArrowBuf[10], address:140388661993496, capacity:8
    ledger[1] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382299608346..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[2], address:140388661985280, capacity:16
        ArrowBuf[3], address:140388661985280, capacity:8
        ArrowBuf[4], address:140388661985288, capacity:8
    ledger[2] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382315070903..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[7], address:140388661985304, capacity:8
        ArrowBuf[5], address:140388661985296, capacity:16
        ArrowBuf[6], address:140388661985296, capacity:8
    ledger[8] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382324122274..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[19], address:140388662058504, capacity:504
        ArrowBuf[18], address:140388662042624, capacity:15880
        ArrowBuf[17], address:140388662042624, capacity:16384
    ledger[9] allocator: allocatorJDBC), isOwning: , size: , references: 1, life: 183382325184402..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[20], address:140388661993504, capacity:32
    ledger[6] allocator: allocatorJDBC), isOwning: , size: , references: 1, life: 183382323054940..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[15], address:140388662018048, capacity:16384
    ledger[7] allocator: allocatorJDBC), isOwning: , size: , references: 1, life: 183382323374059..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[16], address:140388662034432, capacity:512
  reservations: 0

        at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:445)
        at dataset.domingo.WriteArrowObjectsToParquet.main(WriteArrowObjectsToParquet.java:34)
    Suppressed: java.util.ConcurrentModificationException
        at java.base/java.util.IdentityHashMap$IdentityHashMapIterator.nextIndex(IdentityHashMap.java:737)
        at java.base/java.util.IdentityHashMap$KeyIterator.next(IdentityHashMap.java:828)
        at org.apache.arrow.memory.BaseAllocator.print(BaseAllocator.java:693)
        at org.apache.arrow.memory.BaseAllocator.print(BaseAllocator.java:689)
        at org.apache.arrow.memory.BaseAllocator.toString(BaseAllocator.java:501)
        at org.apache.arrow.memory.RootAllocator.toString(RootAllocator.java:29)
        at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:432)
        at org.apache.arrow.memory.RootAllocator.close(RootAllocator.java:29)
        at dataset.domingo.WriteArrowObjectsToParquet.main(WriteArrowObjectsToParquet.java:34)
07:52:57.697 [main] DEBUG org.apache.arrow.memory.BaseAllocator - closed allocator[allocatorReader].
lidavidm commented 1 year ago

Have you isolated the problem? Looked at a debugger? Enabled allocation tracing?

davisusanibar commented 1 year ago

Hi @danepitkin changes was added as requested.

danepitkin commented 1 year ago

Nice work! I left a couple more comments. Let me know what you think.

davisusanibar commented 1 year ago

Nice work! I left a couple more comments. Let me know what you think.

What are those comments?

danepitkin commented 1 year ago

I forgot to hit "Submit Review" 😅 sorry!

davisusanibar commented 1 year ago

I would appreciate your help with a new code review, @danepitkin.

pronzato commented 11 months ago

Hi David,

When I try to run JDBCReader I get URI has empty scheme

java.lang.RuntimeException: URI has empty scheme: '/tmp

            at

org.apache.arrow.dataset.file.JniWrapper.writeFromScannerToFile(Native Method)

            at

org.apache.arrow.dataset.file.DatasetFileWriter.write(DatasetFileWriter.java:46)

            at

org.apache.arrow.dataset.file.DatasetFileWriter.write(DatasetFileWriter.java:59)

Any idea what could be causing this?

Regards

GP

On Fri, Sep 15, 2023, 5:05 PM David Li @.***> wrote:

@.**** commented on this pull request.

In java/source/jdbc.rst https://github.com/apache/arrow-cookbook/pull/316#discussion_r1327786069 :

+

  • @Override
  • protected Schema readSchema() throws IOException {
  • return null;
  • }
  • @Override
  • public VectorSchemaRoot getVectorSchemaRoot() throws IOException {
  • if (root == null) {
  • root = iter.next();
  • }
  • return root;
  • }
  • }
  • ((Logger) LoggerFactory.getLogger("org.apache.arrow")).setLevel(Level.TRACE);

Why are we fiddling with loggers and adding logback to the example? I don't think we need any of that?

In java/source/jdbc.rst https://github.com/apache/arrow-cookbook/pull/316#discussion_r1327786570 :

  • import org.apache.arrow.dataset.scanner.ScanOptions;
  • import org.apache.arrow.dataset.scanner.Scanner;
  • import org.apache.arrow.dataset.source.Dataset;
  • import org.apache.arrow.dataset.source.DatasetFactory;
  • import org.apache.arrow.memory.BufferAllocator;
  • import org.apache.arrow.memory.RootAllocator;
  • import org.apache.arrow.vector.VectorSchemaRoot;
  • import org.apache.arrow.vector.ipc.ArrowReader;
  • import org.apache.arrow.vector.types.pojo.Schema;
  • import org.apache.ibatis.jdbc.ScriptRunner;
  • import org.slf4j.LoggerFactory;
  • import ch.qos.logback.classic.Level;
  • import ch.qos.logback.classic.Logger;
  • class JDBCReader extends ArrowReader {

Explain that we need this because writing a dataset takes an ArrowReader, so we have to adapt the JDBC ArrowVectorIterator to the ArrowReader interface

In java/source/jdbc.rst https://github.com/apache/arrow-cookbook/pull/316#discussion_r1327787518 :

  • final BufferAllocator allocatorParquetWrite = allocator.newChildAllocator("allocatorParquetWrite", 0,
  • Long.MAX_VALUE);
  • final Connection connection = DriverManager.getConnection(
  • "jdbc:h2:mem:h2-jdbc-adapter")
  • ) {
  • ScriptRunner runnerDDLDML = new ScriptRunner(connection);
  • runnerDDLDML.setLogWriter(null);
  • runnerDDLDML.runScript(new BufferedReader(
  • new FileReader("./thirdpartydeps/jdbc/h2-ddl.sql")));
  • runnerDDLDML.runScript(new BufferedReader(
  • new FileReader("./thirdpartydeps/jdbc/h2-dml.sql")));
  • JdbcToArrowConfig config = new JdbcToArrowConfigBuilder(allocatorJDBC,
  • JdbcToArrowUtils.getUtcCalendar())
  • .setTargetBatchSize(2)
  • .setReuseVectorSchemaRoot(true)
  • .setArraySubTypeByColumnNameMap(

In the interest of keeping examples concise, let's use sample data that doesn't require us to deal with all of this in the first place.

— Reply to this email directly, view it on GitHub https://github.com/apache/arrow-cookbook/pull/316#pullrequestreview-1629722233, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACO7PHBDXWQAOPIBFG2WX6LX2S7IZANCNFSM6AAAAAA2WFM25A . You are receiving this because you are subscribed to this thread.Message ID: @.***>

davisusanibar commented 11 months ago

Hi David, When I try to run JDBCReader I get URI has empty scheme java.lang.RuntimeException: URI has empty scheme: '/tmp at org.apache.arrow.dataset.file.JniWrapper.writeFromScannerToFile(Native Method) at org.apache.arrow.dataset.file.DatasetFileWriter.write(DatasetFileWriter.java:46) at org.apache.arrow.dataset.file.DatasetFileWriter.write(DatasetFileWriter.java:59) Any idea what could be causing this? Regards GP

Hi @pronzato, this project also uses JDBC reader https://github.com/davisusanibar/java-python-by-cdata.git.

Could you please try using that and confirm if it is also failing?