Closed mjumbewu closed 3 years ago
An additional piece of information about the device_transaction
duplicates: as is mentioned in this slack thread, as of right now (Sep 15, 10:10 AM) there are 813 duplicate littlepay_transaction_id
values in device_transactions
, and for all but 22 of those, at least one of the duplicate records has a route_id
value of 'Route Z'
, which is apparently an "unidentified" route.
See this gist from late September:
https://gist.github.com/mjumbewu/ac1e2e56bac5eb6a6e2ae2568303a8a8
Closing with follow-up work to be done in https://github.com/cal-itp/data-infra/issues/596
Describe the bug
In both the
device_transactions
andmicropayments
tables we are seeing multiple records that have the same unique key (or what we thought were unique keys). We need to know:As an example, the micropayment with
micropayment_id = '000317e7-220f-4c09-817c-b32a7b308812'
occurs in the _gs://gtfs-data/mst/processed/micropayments/2021-09-13_202107190532micropayments.psv file twice:transaction_time
of 2021-07-17 22:29:19 UTCtransaction_time
of 2021-07-17 22:36:18 UTCSo, in that case, even if we deleted all records before inserting, we'd still end up with a duplicate
micropayments_id
.For the
device_transactions
data, there aren't any duplicatelittlepay_transaction_id
values within the same file. However there are still plenty of duplicates across different files. These duplicates appear to have differentlocation_id
values, so the records aren't complete duplicates (have they been changed/corrected between files?).To Reproduce
micropayments
table, we can surface plenty of duplicatemicropayment_id
values within the same source file with the query https://console.cloud.google.com/bigquery?sq=1005246706141:7022f959b96f40589d153709a833c52adevice_transactions
table, you can find duplicates with the query https://console.cloud.google.com/bigquery?sq=1005246706141:f2557f32e8fe4ddba04d06992ff21464Expected behavior We expected the
micropayments.micropayment_id
anddevice_transactions.littlepay_transaction_id
fields to be unique.