Background:
Before we can perform queries and generate summaries, we will need a method to import data from the SNAP co-purchasing dataset to our chosen database (#2). For the requirements of MS02, a Minimum Viable Product would consist of an end-to-end (extract, transform, and load) implementation for some portion of the dataset.
Problem:
We need a consistent means of setting up the database for downstream data processing.
Success Criteria:
Tasks #2 and #3 have been completed.
Data is able to be ingested from the SNAP dataset file into memory.
Any necessary transformations for loading into the database exist
Data may be inserted into the database.
Extract and transform actions preserve data associations/relations - parity exists between the SNAP dataset and the database.
Stretch: automate retrieval and decompression of dataset directly from SNAP to a predefined location in the project directory structure to keep the repo size down and further ensure consistency. (Should this be a separate task/improvement?)
Background: Before we can perform queries and generate summaries, we will need a method to import data from the SNAP co-purchasing dataset to our chosen database (#2). For the requirements of MS02, a Minimum Viable Product would consist of an end-to-end (extract, transform, and load) implementation for some portion of the dataset.
Problem: We need a consistent means of setting up the database for downstream data processing.
Success Criteria: