Watts-Lab / daphme

Data Access Platform for Human Mobility in Epidemiology
0 stars 0 forks source link

Pipeline step 0: Read partitioned files from S3 in .csv or .parquet #15

Open GolanTrev opened 3 weeks ago

GolanTrev commented 3 weeks ago

Like in these:

We want to pass tests for functions like those, in daphme/io.py .

thom-li commented 3 weeks ago

Sample data is located in arn:aws:s3:::synthetic-raw-data. There are 10-, 100-, and 1000-user options. Each contains dataframes for ground-truth trajectories, sparse sampled trajectories, diaries, and agent homes/workplaces. The trajectories for each agent is 2 weeks long at 1-minute intervals. The sparse trajectories are sampled at either a low or high frequency.

GolanTrev commented 2 weeks ago

Reader class, instantiated with a dictionary mapping file column names to internal column names (references?). We test reading in a folder with partitioned data in multiple .csv (in the future, multiple parquets in multiple folder).

Test: assert whether the loaded object is a pandas dataframe and has the right columns (lat or x, lon or y, time, ha, possibly more).