delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[BUG] Can not instantiate `LogStore` class: io.delta.storage.HDFSLogStore #3299

Closed abhishekrb19 closed 5 months ago

abhishekrb19 commented 5 months ago

Bug

Which Delta project/connector is this regarding?

Describe the problem

Steps to reproduce

Easy to reproduce using a simple ingest query to ingest a Delta table from the Druid-Delta connector. After upgrading the Delta Kernel dependency from 3.1.0 to 3.2.0, we noticed that ingestion from existing Delta tables setup locally stopped working. The stacktrace is:

java.lang.IllegalArgumentException: Can not instantiate `LogStore` class: io.delta.storage.HDFSLogStore
    at io.delta.kernel.defaults.internal.DefaultEngineErrors.canNotInstantiateLogStore(DefaultEngineErrors.java:26)
    at io.delta.kernel.defaults.internal.logstore.LogStoreProvider.getLogStore(LogStoreProvider.java:88)
    at io.delta.kernel.defaults.engine.DefaultFileSystemClient.listFrom(DefaultFileSystemClient.java:76)
    at io.delta.kernel.internal.snapshot.SnapshotManager.listFrom(SnapshotManager.java:228)
    at io.delta.kernel.internal.snapshot.SnapshotManager.listFromOrNone(SnapshotManager.java:253)
    at io.delta.kernel.internal.snapshot.SnapshotManager.listDeltaAndCheckpointFiles(SnapshotManager.java:294)
    at io.delta.kernel.internal.snapshot.SnapshotManager.getLogSegmentForVersion(SnapshotManager.java:466)
    at io.delta.kernel.internal.snapshot.SnapshotManager.getLogSegmentFrom(SnapshotManager.java:415)
    at io.delta.kernel.internal.snapshot.SnapshotManager.getSnapshotAtInit(SnapshotManager.java:352)
    at io.delta.kernel.internal.snapshot.SnapshotManager.buildLatestSnapshot(SnapshotManager.java:113)
    at io.delta.kernel.internal.TableImpl.getLatestSnapshot(TableImpl.java:55)
    at org.apache.druid.delta.input.DeltaInputSource.createSplits(DeltaInputSource.java:210)
    at org.apache.druid.msq.input.external.ExternalInputSpecSlicer.sliceSplittableInputSource(ExternalInputSpecSlicer.java:119)

The underlying issue seems to be that it's unable to instantiate LogStore, which is coming from a new code path added in https://github.com/delta-io/delta/pull/2770/files.

After adding the dependency to the Druid-Delta connector, I was still seeing the same error:

     <dependency>
      <groupId>io.delta</groupId>
      <artifactId>delta-storage</artifactId>
      <version>${delta-kernel.version}</version>
    </dependency>

Thanks to @vkorukanti who swiftly helped provide a workaround by adding the following line before making calls to the Kernel APIs: Thread.currentThread().setContextClassLoader(LogStore.class.getClassLoader()); before calls to the Kernel API

It's unclear if this workaround is needed for all the Kernel APIs or just specifically when getting the snapshot information - i.e., table.getLatestSnapshot(engine)

Observed results

Ingestion fails.

Further details

The slack thread has some more information: https://delta-users.slack.com/archives/CJ70UCSHM/p1719190251166299

Environment information

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

vkorukanti commented 5 months ago

Hi @abhishekrb19, I think we can just use the Class.forName(className) here instead of using the Class.forName(className, classloader from current thread).

Let me know if you want to make a PR to remove those.