Currently, BadgerDoc executes pipelines with the files_data parameter under the assumption that the pipeline engine can access the S3 file without any extra configuration (stores credentials on its own side). However, we need to enable pipeline engines to download S3 files without storing authentication credentials on their side. Therefore, this task involves adding the capability to generate S3 signed URLs as passed arguments to the pipeline manager
The s3_signed_url needs to be filled with the generated Signed Url if BadgerDoc is configured with the parameter: JOBS_RUN_PIPELINES_WITH_SIGNED_URL=True. This value can only be set to True if S3_PROVIDER is configured as aws_iam. By default, JOBS_RUN_PIPELINES_WITH_SIGNED_URL is set to False.
What needs to be changed
Add JOBS_RUN_PIPELINES_WITH_SIGNED_URL= to .env.example
What needs to be changed
Add JOBS_RUN_PIPELINES_WITH_SIGNED_URL= to the .env.example.
What needs to be additionally checked
Given that we're expecting a huge (almost unlimited) number of documents to be passed as S3 signed URLs, aioboto3 library integration into the jobs microservice could be considered to speed up the process of generating signed URLs.
Currently, BadgerDoc executes pipelines with the
files_data
parameter under the assumption that the pipeline engine can access the S3 file without any extra configuration (stores credentials on its own side). However, we need to enable pipeline engines to download S3 files without storing authentication credentials on their side. Therefore, this task involves adding the capability to generate S3 signed URLs as passed arguments to the pipeline managerThe PR https://github.com/epam/badgerdoc/pull/839 adds functionality to execute pipelines with additional arguments using the dataclass:
The
s3_signed_url
needs to be filled with the generated Signed Url if BadgerDoc is configured with the parameter:JOBS_RUN_PIPELINES_WITH_SIGNED_URL=True
. This value can only be set to True ifS3_PROVIDER
is configured asaws_iam
. By default,JOBS_RUN_PIPELINES_WITH_SIGNED_URL
is set to False.What needs to be changed
JOBS_RUN_PIPELINES_WITH_SIGNED_URL=
to.env.example
What needs to be changed
JOBS_RUN_PIPELINES_WITH_SIGNED_URL=
to the.env.example
.What needs to be additionally checked
Given that we're expecting a huge (almost unlimited) number of documents to be passed as S3 signed URLs,
aioboto3
library integration into thejobs
microservice could be considered to speed up the process of generating signed URLs.