bluehalo / nifi-nar-bundles

Apache License 2.0
13 stars 0 forks source link

ListS3Batch Attempt #2 #98

Closed Dreichard38 closed 2 years ago

Dreichard38 commented 2 years ago

I recently switched teams, and found that we wanted to be able to use this processor there as well, so I thought I'd give porting this to the shared NARs another try.

This processor allows for listing an S3 bucket using a configurable batch size, starting prefix, starting key, and end key. Each execution of the processor fetches only one batch of keys, which allows you to stop/start the list whenever you want. The values used to maintain this state can be found in the processor state itself, so you can keep tabs on what the processor is doing. This is the main advantage from the built-in ListS3 processor, which will keep listing on one thread until there's nothing else to list in the bucket.

Here's a list of changes I made to the processor since the last PR, to hopefully make it work as requested 2 years ago:

  1. Changed the processor to use the AWS SDK V1 instead of V2, to fit better with the existing NiFi logic
  2. Added support for using an AWS credentials service to manage AWS permissions
  3. Added support for an SSL context
  4. Updated Mockito, since the version being used was 3 major versions behind, and we needed some of the newer features in the unit tests