This should fix total_edits ending up null as
soon as one of the operands is null for any
given row write operation.
Notes
Both coalesces are required as a new changeset could have no interesting counts, same as the old summed changesets.
I updated the batch-process.sh script to use a json configuration for its spark conf. This should make it easier to adjust in the future. I also swapped to static instance types and EBS configuration since we need a particular combination of CPU/Mem/Disk resources for this job to run successfully.
There remain count discrepancies between the production and staging databases. I opened a separate epic to continue to investigate this (#186), as those incorrect values are outside the scope of this specific fix.
Testing
To verify this fix, I compared the total_edits in user_statistics with the counts currently on production. While they remain low compared to production, they are much closer than they were before. In addition, I ran a query to determine if there are any changesets that have values in the jsonb counts field but where total_edits remains null. There are none:
osmesa_stats_staging=> select count(*) from changesets where total_edits is null and counts is not null;
count
-------
0
(1 row)
@jpolchlo some changes around whitespace and the env var handling for instance type. Just bumping in case you want to take one more look, even though it was approved earlier.
This should fix total_edits ending up null as soon as one of the operands is null for any given row write operation.
Notes
Both coalesces are required as a new changeset could have no interesting counts, same as the old summed changesets.
I updated the batch-process.sh script to use a json configuration for its spark conf. This should make it easier to adjust in the future. I also swapped to static instance types and EBS configuration since we need a particular combination of CPU/Mem/Disk resources for this job to run successfully.
There remain count discrepancies between the production and staging databases. I opened a separate epic to continue to investigate this (#186), as those incorrect values are outside the scope of this specific fix.
Testing
To verify this fix, I compared the total_edits in user_statistics with the counts currently on production. While they remain low compared to production, they are much closer than they were before. In addition, I ran a query to determine if there are any changesets that have values in the jsonb
counts
field but wheretotal_edits
remains null. There are none: