Closed ChenglimEar closed 10 months ago
This can be a good approach. You talk about timestamps changing, but I don't see where you handle this. There are also changes in orderings of records within the generated files. Does this mask this problem as well? While not opposed to fixing the rounding here, wouldn't it be better to fix it in the code. I made some recent fixes in code I was updating, but we could make a complete pass and do it everywhere.
@mikeubell This is currently a work in progress. The rounding is a placeholder for capturing anything that is different when data hasn't changed. Once we have rounding fixed, we can remove this. I wanted to get Travis CI working and automatic checking in place before adding support for other differences.
While we compare the build directory to make sure that we continue to generate the correct data, the comparison is noisy because there are some rounding differences, differences due to the ordering of elements in arrays, differences due to a timestamp, and differences due to undefined sort order for top contributors and spenders that share the same total amounts. To address this, we create digests for comparison. What we do is look through the JSON files in the build directory and after cleaning the contents, we generate hashes for the data in the JSON files and save those hashes to a file,
build/digests.json
. If the cleaned data in the build directory hasn't changed, this file should remain unchanged when generated for the build directory.Before approving this, we have to make sure that the digests checked in are for the build from the master branch and it doesn't change when we re-build this branch (since we didn't change anything that would have changed the build directory). After this branch is merged into master with the digests, we will be able to see with future merges whether there were changes to the build directory that aren't noise.
There is a parallel effort to fix the rounding differences on different machines. When that is fixed, we can use this mechanism to verify that it works by removing the rounding logic in the data cleaning portion of the code.
We can also start a parallel effort to ensure that top contributors and top spenders lists have a defined sorting order when contributions or spending are the same. When that is fixed, we can use this mechanism to verify that it works by removing the logic that redacts the names for cases when the sort order is undefined.