Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.75k stars 2.92k forks source link

DistributedLoadCommand Java API having grpc channel type compatibility issue #16860

Open coff33Overflow opened 1 year ago

coff33Overflow commented 1 year ago

Alluxio Version: 2.8.1

Describe the bug

Capture1 DistributeLoadCommand Java API giving bad operand type error due to incompatible grpc channels.

Exception in thread "main" java.lang.VerifyError: Bad type on operand stack
Exception Details:
  Location:
    alluxio/client/job/RetryHandlingJobMasterClient.afterConnect()V @5: invokestatic
  Reason:
    Type 'alluxio/grpc/GrpcChannel' (current frame, stack[1]) is not assignable to 'io/grpc/Channel'
  Current Frame:
    bci: @5
    flags: { }
    locals: { 'alluxio/client/job/RetryHandlingJobMasterClient' }
    stack: { 'alluxio/client/job/RetryHandlingJobMasterClient', 'alluxio/grpc/GrpcChannel' }
  Bytecode:
    0x0000000: 2a2a b400 06b8 0007 b500 02b1 

To Reproduce Integrate HDFS with alluxio in local cluster.

Sharing the alluxio-site.properties and java code script, which tries to load the mounted UFS (HDFS) data into alluxio memory using DistributedLoadCommand is throwing error due to grpc channel compatibility issue. FSOperations.zip alluxio-site.properties

Expected behavior Mounted HDFS data should have been loaded into alluxio memory via this java api as we are using global configuration which is getting picked up from alluxio-site.properties.

CLI command distributedLoadis working fine whereas Java api is not working fine. SHaring the screenshot for your reference. Capture

Urgency This bug is acting as major blocker in order to load the large amount of data from any external UFS into alluxio. It can be done through CLI but we are building something which requires connecting to remote alluxio cluster. So instead of doing ssh into remote alluxio master node and using CLI command we found alluxio java api is more friendly way to do operations on remote alluxio file systems.

Are you planning to fix it Please indicate if you are already working on a PR.

Additional context

Also, we tried integrating alluxio with spark which is able to load the data from ext UFS into alluxio but spark has limitation of not keeping the file name same when dealing with parquet files.

For more info read this slack thread: https://alluxio-community.slack.com/archives/C03RDNW962C/p1675157956837039

LuQQiu commented 1 year ago

@coff33Overflow what are the dependencies of your JAVA API?

coff33Overflow commented 1 year ago

@LuQQiu

I don't get it; may you please check the pom.xml of project I shared for all dependencies required for this project in description.

coff33Overflow commented 1 year ago

@LuQQiu What more details you need?

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.