User Story:
As a data engineer, I want to set up an internal batch scoring MBD API endpoint, so that I can process large datasets efficiently for the data team and provide results in a downloadable CSV file.
Acceptance Criteria:
GIVEN the internal API endpoint,
WHEN the data team submits a list of addresses with their API key,
THEN the API should provide an estimated processing time and a job ID, allow status checks via a separate endpoint, and return an S3 bucket link to download the CSV file with the results when the job is completed.
Tech Details:
Implement the internal batch scoring API endpoint.
Ensure the endpoint can handle large datasets efficiently.
Restrict access to the endpoint to the data team only.
Authenticate with API keys from the scorer app.
Operate the compute in a dedicated EC2 instance, automatically triggered by input bucket uploads.
Create an endpoint to submit a list of addresses, which replies with an estimated time and a job ID.
Implement a separate endpoint to check the status of the job using the job ID.
Once the job is completed, provide an S3 link to download the CSV file containing the addresses and their scores.
Hard-code estimates based on historical data for conservative estimates.
Test the endpoint for performance and accuracy.
Open Questions:
[ ] Are there any specific performance metrics to consider?
[ ] What specific format should the CSV file follow?
Notes/Assumptions:
Assume the API infrastructure is already in place.
User Story: As a data engineer, I want to set up an internal batch scoring MBD API endpoint, so that I can process large datasets efficiently for the data team and provide results in a downloadable CSV file.
Acceptance Criteria: GIVEN the internal API endpoint, WHEN the data team submits a list of addresses with their API key, THEN the API should provide an estimated processing time and a job ID, allow status checks via a separate endpoint, and return an S3 bucket link to download the CSV file with the results when the job is completed.
Tech Details:
Open Questions:
Notes/Assumptions: