apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.52k stars 1.29k forks source link

Support for S3A Connector #14312

Open chrajeshbabu opened 3 weeks ago

chrajeshbabu commented 3 weeks ago

Currently conntroller and servers able to start with s3a path but while creating the segments during ingestion facing following error. The reason is while preparing file names we are prefixing the s3 scheme instead of s3a.

This will be useful to make use s3 compatible storages as a deep store.

Working on it.

Caused by: java.lang.IllegalStateException: Unable to extract out the relative path for input file 's3://testhadoop/pinot/batch/airlineStats/rawdata/2014/01/28/airlineStats_data_2014-01-28.avro', based on base input path: s3a://testhadoop/pinot/batch/airlineStats/rawdata/ at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:515) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a] at org.apache.pinot.common.segment.generation.SegmentGenerationUtils.getRelativeOutputPath(SegmentGenerationUtils.java:162) ~[pinot-all-1.2.0-jar-with-dependencies.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a] at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:278) ~[pinot-batch-ingestion-standalone-1.2.0-shaded.jar:1.2.0-cc33ac502a02e2fe830fe21e556234ee99351a7a] at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] at java.base/java.lang.Thread.run(Thread.java:840) ~[?:?] java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:152) at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:125) at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:132) at org.apache.pinot.tools.Command.call(Command.java:33) at org.apache.pinot.tools.Command.call(Command.java:29) at picocli.CommandLine.executeUserObject(CommandLine.java:2045) at picocli.CommandLine.access$1500(CommandLine.java:148) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) at picocli.CommandLine.execute(CommandLine.java:2174) at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:173) at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:204) Caused by: java.lang.RuntimeException: Failed to generate Pinot segment for file - s3://testhadoop/pinot/batch/airlineStats/rawdata/2014/01/28/airlineStats_data_2014-01-28.avro at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:287) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: java.lang.IllegalStateException: Unable to extract out the relative path for input file 's3://testhadoop/pinot/batch/airlineStats/rawdata/2014/01/28/airlineStats_data_2014-01-28.avro', based on base input path: s3a://testhadoop/pinot/batch/airlineStats/rawdata/ at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:515) at org.apache.pinot.common.segment.generation.SegmentGenerationUtils.getRelativeOutputPath(SegmentGenerationUtils.java:162) at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:278) ... 5 more

alguiguilo098 commented 3 weeks ago

@chrajeshbabu Hello! I’m currently studying Computer Science, and I’m very interested in contributing to open-source projects. If there are any tasks I could get started, please let me know. Thank you

chrajeshbabu commented 3 weeks ago

Hi @alguiguilo098 @Jackie-Jiang @mayankshriv are the right people to guide and help to you to contribute some meaningful work to this community. Thanks

alguiguilo098 commented 3 weeks ago

@chrajeshbabu Thanks

alguiguilo098 commented 3 weeks ago

@Jackie-Jiang @mayankshriv help me contribute to this community