Open GolanTrev opened 3 weeks ago
Sample data is located in arn:aws:s3:::synthetic-raw-data. There are 10-, 100-, and 1000-user options. Each contains dataframes for ground-truth trajectories, sparse sampled trajectories, diaries, and agent homes/workplaces. The trajectories for each agent is 2 weeks long at 1-minute intervals. The sparse trajectories are sampled at either a low or high frequency.
Reader class, instantiated with a dictionary mapping file column names to internal column names (references?). We test reading in a folder with partitioned data in multiple .csv (in the future, multiple parquets in multiple folder).
Test: assert whether the loaded object is a pandas dataframe and has the right columns (lat or x, lon or y, time, ha, possibly more).
Like in these:
https://github.com/Watts-Lab/covid_gps/blob/main/covid-clustering/%5BF%5Dsubset_phl.ipynb
https://github.com/Watts-Lab/cf-networks/blob/master/%5BF%5Dmake_data.ipynb
https://github.com/mindearth/mobilkit/blob/main/examples/01_mobilkit_example.ipynb
[x] Conceive a unit test that makes sense with internal sample data
[x] (Thomas) provide the sample data
[ ] Code and pass a test that imports and formats
We want to pass tests for functions like those, in daphme/io.py .