UoB-DSMP-2023-24 / dsmp-2024-group-ol2

dsmp-2024-group-ol2 created by GitHub Classroom
0 stars 0 forks source link

Run Initial processing on full LOB dataset locally #17

Closed JackDI1 closed 8 months ago

JackDI1 commented 8 months ago

The LOB dataset is approximately 6.5GB. Our initial processing only ran on 2 of the files. So we can carry out the initial EDA, the full dataset needs to be processed.

JackDI1 commented 8 months ago

I have downloaded the data locally however, because the data is so large I don't think we should be storing the data on GitHub, potentially could we store the raw data in S3 then the processed data in a DB as suggested by Adam

JackDI1 commented 8 months ago

I tried running this locally. Unfortunately my laptop crashed 3 times