Open david-allison opened 2 weeks ago
Priority high as this is a privacy issue. We should not be sending email addresses to ACRA
It would be ideal to get this into the 24.11
backend
For now, is it possible to add some middleware which filters that particular error? Other alternative is to modify (worst case drop) them periodically from the db.. Not nice that it's sending PII like that..
We need both:
As the message is translated, this makes it somewhat difficult.
We can likely filter these messages by checking whether the first line contains: [BackendSyncException
, ankiweb.net
, @
]
We'd also want client-side code after this is patched to ensure that this server-side filter isn't scrubbing unknown/unhandled error messages
Diagnostics:
error.kind = 8
=> SYNC_OTHER_ERROR
error.message = "Syncing failed, because your email address needs to be (re)confirmed. Please visit ankiweb.net, and log in as d****@gmail.com to proceed."
Anki Backend:
logcat
anki::syn...lectio.. D redirect to new location endpoint="https://sync10.ankiweb.net/"
D meta remote=SyncMeta { modified: TimestampMillis(1728774137848), schema: TimestampMillis(1718379983437), usn: Usn(6173), current_time: TimestampSecs(1731066728), server_message: "Syncing failed, because your email address needs to be (re)confirmed. Please visit ankiweb.net, and log in as d***@gmail.com to proceed.", should_continue: false, host_number: 10, empty: false, media_usn: Usn(21646), v2_scheduler_or_later: false, v2_timezone: false }
D meta local=SyncMeta { modified: TimestampMillis(0), schema: TimestampMillis(1731062594502), usn: Usn(0), current_time: TimestampSecs(1731066725), server_message: "", should_continue: true, host_number: 0, empty: true, media_usn: Usn(0), v2_scheduler_or_later: true, v2_timezone: true }
D server says abort remote.server_message="Syncing failed, because your email address needs to be (re)confirmed. Please visit ankiweb.net, and log in as d***@gmail.com to proceed."
CollectionManager W blocked main thread for 960ms:
com.ichi2.anki.DeckPicker.updateDeckList(DeckPicker.kt:2059)
DeckPicker D updateDeckList
DeckPicke...DeckList D Refreshing deck list
Choreographer I Skipped 57 frames! The application may be doing too much work on its main thread.
DeckPicker D onCreateOptionsMenu()
BadgeDrawableBuilder D Adding badge
DeckPicker I Updating deck list UI
D Not rendering deck list as there are no cards
D Startup - Deck List UI Completed
BadgeDrawableBuilder D Adding badge
Note that while in there special casing this sync exception from "other", the "device time out of sync" error should also be special cased (yes this increases scope slightly here but it is a trivial amount of extra work to do both at once) - this would allow us to provide user feedback well for that specific error - see comment on PR here for motivation / details: https://github.com/ankidroid/Anki-Android/pull/17017#pullrequestreview-2306605582
For Acrarium, I'm trying to conjure the correct SQL query to select only those reports that have this exception and no other but it is a little difficult since the exception is localized. Here is my progress so far:
1- log in, if you have access - ssh ankidroid@ankidroid.org
2- enter database, if you have access mysql -u acrarium -p acrarium
3- query for the 3 areas where the PII landed:
select * from stacktrace where stacktrace like '%BackendSyncException%' and stacktrace like '%net.ankiweb.rsdroid.exceptions.BackendSyncException: %' and regexp_like(stacktrace, ' [^[:blank:]]+@[^[:blank:]^\.]+\.[[:alnum:]]+ ');
from report where content like '%BackendSyncException%' and content like '% "STACK_TRACE": "net.ankiweb.rsdroid.exceptions.BackendSyncException: %' and regexp_like(content, ' [^[:blank:]]+@[^[:blank:]^\.]+\.[[:alnum:]]+ ');
select title from bug where title like '%BackendSyncException%' and regexp_like(title, ' [^[:blank:]]+@[^[:blank:]^\.]+\.[[:alnum:]]+ ');
4- purge these entries (we're not interested in them anyway) - it turns out that if you purge the bug entry (and you can find them all by finding all the report table entries that match then following bug -> stacktrace.bug_id -> report.stacktrace_id) it will trigger deletes cascading from bug to stacktrace to report and then searching for email addresses will yield empty sets everywhereThis query took approximately 20 secs to run, so perhaps running it once an hour or so is about right?
delete from acrarium.bug
where id in
(select bug_id from acrarium.stacktrace
where id in
(select stacktrace_id from acrarium.report
where content like '%BackendSyncException%' and
content like '%
net.ankiweb.rsdroid.exceptions.BackendSyncException: %' and
regexp_like(content, ' [^[:blank:]]+@[^[:blank:]^\.][\.[:alpha:]+]+ ')));
Note that there are multiple messages that have the email, not just (re)confirm email
also terms and conditions
:
net.ankiweb.rsdroid.exceptions.BackendSyncException: Syncing failed. Please visit ankiweb.net and log in as redacted@example.com to confirm the updated terms and conditions.
I've done the Acrarium part.
I can't really stop the reports from hitting the DB without hacking on acrarium which is problematic for a number of reasons - not least of which being that current main doesn't even build over there and is a big upgrade code-wise from what we're running now including a database migration
but!
I'm very comfortable that I can locate these entries and delete them (as documented above) and I've set up an hourly in-mysql event to purge them so they will never stay for long
mysql> select event_name,event_body,event_type,interval_value,interval_field,event_definition from information_schema.events;
+---------------------+------------+------------+----------------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EVENT_NAME | EVENT_BODY | EVENT_TYPE | INTERVAL_VALUE | INTERVAL_FIELD | EVENT_DEFINITION |
+---------------------+------------+------------+----------------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| purge_email_reports | SQL | RECURRING | 1 | HOUR | delete from bug
where id in
(select bug_id from stacktrace
where id in
(select stacktrace_id from report
where content like '%BackendSyncException%'
and content like '% "STACK_TRACE": "net.ankiweb.rsdroid.exceptions.BackendSyncExc
eption: %'
and regexp_like(content, ' [^[:blank:]]+@[^[:blank:]]+ '))) |
+---------------------+------------+------------+----------------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Out of distrust I checked in on my periodic acrarium purge this morning and it was of course not executing correctly. I needed to prefix table names with schemas in the event scheduler context, and that fixed it. Took the opportunity to clean up the matching regex a little as well. Appears to be executing correctly from event scheduler now and verified acrarium is free of PII at the moment
Considering that user privacy was compromised by this incident, someone should do the same work (or at the very least thoroughly review everything you've set up so far) and make sure that they arrive at the same outcome as you did (i.e. acrarium is PII free at this moment, and in the future). It's not generally good to do the same work over again, but this should be a one-in-a-lifetime thing, and if the result is that "ok, we've now really made sure PII isn't stored" then I don't see how it's a waste of time.
Good point @voczi - I've done my best but still made an error first time around. Happy to help anyone that wants to get access to the mysql instance on the ankidroid.org server, especially since it really shouldn't have PII on there. Or if you want to do it I'd welcome the extra look and I know you've got access.
Taking a look now.
Can confirm we still have issues. (report from 2024-11-05, task last executed 1 hr ago) Gonna brainstorm with Mike.
I've wrote the following constraints (and also purged rows matching these beforehand):
(not((`content` like _utf8mb4\'%and log in as%\')))
(report table)
(not((`title` like _utf8mb4\'%and log in as%\')))
(bug table)
(not((`stacktrace` like _utf8mb4\'%and log in as%\')))
(stacktrace table)
Now, all of these messages should be gone. Tested with two reports on the latest build, one crash report with a test exception from the dev menu and the other with this message where it says my email has to be verified. And, also, I don't think we need the event anymore?
I hadn't thought of adding a constraint, that's way better, obvious in hindsight, big improvement
The only issue with a constraint is that it means the report will get stuck on the client and be retried until deleted I think All reports are cleared on version upgrades, so that is self-healing I think - and a reasonable tradeoff vs having PII flying around
I double-checked and I don't see any more PII anywhere, so with the constraints in place yes I can disable the event - I have done so
moving this to 2.20 milestone as the constraints are blocking things now, and we have an exception subclass but it's for next version of anki upstream
https://ankidroid.org/acra/app/1/bug/252627/report/27d23a2f-90f0-4c10-a5fa-2e3ccf1ac093
Task List
BackendSyncException
from being sent to ACRABackendSyncException
Related: