Background:
Since the SNAP dataset is approximately 1GB, we will need to be mindful of the loading/processing time needed to import the dataset into the database.
Problem:
We need a way of tracking the performance of the ETL to identify any inefficiencies early on so we are not using excess CPU cycles or human time.
Success Criteria:
Note: Design & engineering input needed for log output format.
ETL contains methods for measuring the time spent in core stages of data processing: read, transform, and insert, with others defined as needed after team discussion.
Measurements are summarized at the end of the ETL processing and output to a log file.
Each measurement is summarized by cumulative time per step and average time per item per step.
Background: Since the SNAP dataset is approximately 1GB, we will need to be mindful of the loading/processing time needed to import the dataset into the database.
Problem: We need a way of tracking the performance of the ETL to identify any inefficiencies early on so we are not using excess CPU cycles or human time.
Success Criteria: