RSGInc / bca4abm

Benefit Cost Analysis for Travel Demand Models
http://rsginc.github.io/bca4abm/
Other
7 stars 5 forks source link

Chunking causes incorrect expression assignment for certain calculations #104

Closed blakerosenthal closed 4 years ago

blakerosenthal commented 4 years ago

Some expressions found in link_daily.csv produce different results depending on the dataframe chunk they are given. This is because they are attempting to perform a calculation for the entire links df but only have a portion of it.

Something that can influence chunk size is the number of reported target variables in link_daily.csv -- by adding new reporting rows, the chunk size calculation puts different links in different chunks, changing the results of any expression that requires knowledge of every link. Therefore, any expression operating on an incomplete chunk will produce an incorrect result.

The temporary solution is to not use chunking, which is only possible for small datasets on reasonably powerful hardware (this can be done by increasing chunk_size in settings.yaml).