apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.9k stars 4.27k forks source link

Kinesis x-lang support depends on deprecated Kinesis IO (Aws Sdk v1) #23570

Open mosche opened 2 years ago

mosche commented 2 years ago

What would you like to happen?

X-lang support for Kinesis should be migrated to use the v2 module.

Respective urn's are:

The v1 modules (kinesis, aws) will be eventually removed.

Issue Priority

Priority: 2

Issue Component

Component: io-java-kinesis

xinbinhuang commented 2 years ago

.take-issue

xinbinhuang commented 2 years ago

@mosche .. seems like I can't take the issue. Is there an alternative way to assign the issue to me?

mosche commented 2 years ago

👍 Looks like it worked

mosche commented 1 year ago

@xinbinhuang Kind ping, are you still working on this?

aromanenko-dev commented 1 year ago

Ping @chamikaramj Do you know if someone is working on this?

chamikaramj commented 1 year ago

I'm not aware of any. Are Java the APIs of v1 and v2 similar ? If so this should be a relatively straightforward update.

cc: @johnjcasey

mosche commented 1 year ago
ReadDataFromKinesis KinesisIO.Read (v1) KinesisIO.Read (v2) Comments
stream_name withStreamName withStreamName
aws_access_key /
aws_secret_key
withAWSClientsProvider withClientConfiguration Use of static credentials is a bad practice, use the default credentials provider chain instead
region withAWSClientsProvider withClientConfiguration
service_endpoint withAWSClientsProvider withClientConfiguration
verify_certificate withAWSClientsProvider can't be disabled
initial_position_in_stream withInitialPositionInStream withInitialPositionInStream
initial_timestamp_in_stream withInitialTimestampInStream withInitialTimestampInStream
max_capacity_per_shard withMaxCapacityPerShard withMaxCapacityPerShard
max_num_records withMaxNumRecords withMaxNumRecords
max_read_time withMaxReadTime withMaxReadTime
rate_limit withFixedDelayRateLimitPolicy withFixedDelayRateLimitPolicy
request_records_limit withRequestRecordsLimit withRequestRecordsLimit
up_to_date_threshold withUpToDateThreshold withUpToDateThreshold
watermark_idle_duration_threshold withArrivalTimeWatermarkPolicy withArrivalTimeWatermarkPolicy
watermark_policy withArrivalTimeWatermarkPolicy / withProcessingTimeWatermarkPolicy withArrivalTimeWatermarkPolicy / withProcessingTimeWatermarkPolicy
mosche commented 1 year ago

The x-lang writer, unfortunately, is rather broken from a design perspective and will be barely usable for anything beyond testing. It cannot use more than a single Kinesis shard.

WriteToKinesis KinesisIO.Write (v1) KinesisIO.Write (v2) Comments
stream_name withStreamName withStreamName
aws_access_key / aws_secret_key withAWSClientsProvider withClientConfiguration Use of static credentials is a bad practice, use the default credentials provider chain instead
region withAWSClientsProvider withClientConfiguration
partition_key withPartitionKey Configure a random partitioner, e.g. KinesisPartitioner
.explicitRandomPartitioner(shards)
Configuring a single hardcoded partition key is totally broken. That way records will only ever be published to a single shard, no matter how many shards exist.
service_endpoint withAWSClientsProvider withClientConfiguration
verify_certificate withAWSClientsProvider can't be disabled
producer_properties withProducerProperties not using KPL, related config options are withBatchMaxRecords, withBatchMaxBytes, withConcurrentRequests, withRecordAggregationDisabled
johnjcasey commented 1 year ago

I'm not aware of anyone either

xinbinhuang commented 1 year ago

hey, sorry. I totally forgot this while working on other things. I'll work on it this weekend, and give an update!

aromanenko-dev commented 1 year ago

@xinbinhuang Thanks! Please, let us know if you need any help