Some expressions found in link_daily.csv produce different results depending on the dataframe chunk they are given. This is because they are attempting to perform a calculation for the entire links df but only have a portion of it.
Something that can influence chunk size is the number of reported target variables in link_daily.csv -- by adding new reporting rows, the chunk size calculation puts different links in different chunks, changing the results of any expression that requires knowledge of every link. Therefore, any expression operating on an incomplete chunk will produce an incorrect result.
The temporary solution is to not use chunking, which is only possible for small datasets on reasonably powerful hardware (this can be done by increasing chunk_size in settings.yaml).
Some expressions found in
link_daily.csv
produce different results depending on the dataframe chunk they are given. This is because they are attempting to perform a calculation for the entirelinks
df but only have a portion of it.Something that can influence chunk size is the number of reported target variables in
link_daily.csv
-- by adding new reporting rows, the chunk size calculation puts different links in different chunks, changing the results of any expression that requires knowledge of every link. Therefore, any expression operating on an incomplete chunk will produce an incorrect result.The temporary solution is to not use chunking, which is only possible for small datasets on reasonably powerful hardware (this can be done by increasing
chunk_size
insettings.yaml
).