I recently switched teams, and found that we wanted to be able to use this processor there as well, so I thought I'd give porting this to the shared NARs another try.
This processor allows for listing an S3 bucket using a configurable batch size, starting prefix, starting key, and end key. Each execution of the processor fetches only one batch of keys, which allows you to stop/start the list whenever you want. The values used to maintain this state can be found in the processor state itself, so you can keep tabs on what the processor is doing. This is the main advantage from the built-in ListS3 processor, which will keep listing on one thread until there's nothing else to list in the bucket.
Here's a list of changes I made to the processor since the last PR, to hopefully make it work as requested 2 years ago:
Changed the processor to use the AWS SDK V1 instead of V2, to fit better with the existing NiFi logic
Added support for using an AWS credentials service to manage AWS permissions
Added support for an SSL context
Updated Mockito, since the version being used was 3 major versions behind, and we needed some of the newer features in the unit tests
I recently switched teams, and found that we wanted to be able to use this processor there as well, so I thought I'd give porting this to the shared NARs another try.
This processor allows for listing an S3 bucket using a configurable batch size, starting prefix, starting key, and end key. Each execution of the processor fetches only one batch of keys, which allows you to stop/start the list whenever you want. The values used to maintain this state can be found in the processor state itself, so you can keep tabs on what the processor is doing. This is the main advantage from the built-in ListS3 processor, which will keep listing on one thread until there's nothing else to list in the bucket.
Here's a list of changes I made to the processor since the last PR, to hopefully make it work as requested 2 years ago: