HabitRPG / habitica

A habit tracker app which treats your goals like a Role Playing Game.
https://habitica.com
Other
12k stars 4.09k forks source link

corrupt notifications array causes "User validation failed" errors #9923

Closed Alys closed 6 years ago

Alys commented 6 years ago

It's possible for users to have just this as their notifications array:

    "notifications": [
        {
            "seen": true,
        }
    ]

When that happens, they get repeated user validation errors like this when completing tasks, buying items, etc, etc:

{ "success": false, "error": "BadRequest", 
  "message": "User validation failed",
  "errors": [ { "message": "Path `type` is required.", "path": "type" } ] }

screenshot_2018-02-02_18-44-24

I've searched the users who have logged in in the past two days and found it in 13 accounts. I've removed it from those accounts now. I'll search older accounts too today and repeat the search tomorrow.

I don't know yet if this is something that happened just because of today's migration from the old notification structure to the new one, or if it can be an ongoing problem. If the former, we can ignore this issue - it will stop being a problem after I've cleared all the affected accounts.

However if it's an ongoing bug, it should be dealt with as a high priority because getting those errors after nearly every action is stressful and confusing. There's also indications in the two reports I've seen that it can stop your actions from taking effect. If we can't work out what's causing it rapidly, I suggest we modify our code temporarily so that it doesn't treat this data as a serious error (we'd revert that change once a proper fix is in place).

For reference, here's what the notifications array should look like when all notifications have been dismissed:

    "notifications": [ ]

and here's what it should look like with correctly-formatted notifications:

    "notifications": [
        {
            "id": "c4523248-52f7-4bde-b990-7bf05aedbf0b",
            "data": {
                "group": {
                    "name": "Chaos Guild",
                    "id": "85dd1be6-336b-4e53-9949-2c09150b4035"
                }
            },
            "seen": false,
            "type": "NEW_CHAT_MESSAGE"
        }
    ]
paglias commented 6 years ago

Keep me updated about this, I can't find anything that would cause this so hopefully it's due to some people having corrupt notification data from before

Alys commented 6 years ago

A similar problem from a user who hasn't logged in today and so they weren't included in my initial search:

"notifications":[{"updatedAt":"2016-12-07T17:48:18.877Z"}]

That's likely to be from old corrupt data though.

Alys commented 6 years ago

It's more widespread because the "seen":true data can occur with other non-corrupt notifications. I still have no evidence about whether it's a once-off or ongoing problem but I expect to have a better idea when I've done some more searching and fixes.

    "notifications": [
        {
            "type": "UNALLOCATED_STATS_POINTS",
            "seen": true,
            "data": {
                "points": 13
            },
            "id": "a0ccabcd-b539-4a65-8270-f5cef4fb5d6c"
        },
        {
            "seen": true
        },
        {
            "seen": true
        }
    ],
Alys commented 6 years ago

And null notifications:

"notifications":[
null,
{"seen":true},
{"id":"54d849a1-d89b-423a-9b1f-9d44482b995e","data":{"group":{"name":"guild name","id":"92900a1d-2d0f-4e19-81c3e2f2362a4660"}},"seen":false,"type":"NEW_CHAT_MESSAGE"}
]}
paglias commented 6 years ago

The fact is notifications have been around for months now, the only thing that changed is that a new field was added. And validation didn't change. So there's something that causes them to get corrupted / skip validation

Il 02 Feb 2018 10:46, "Alys" notifications@github.com ha scritto:

And null notifications:

"notifications":[ null, {"seen":true}, {"id":"54d849a1-d89b-423a-9b1f-9d44482b995e","data":{"group":{"name":"guild name","id":"92900a1d-2d0f-4e19-81c3e2f2362a4660"}},"seen":false,"type":"NEW_CHAT_MESSAGE"} ]}

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/HabitRPG/habitica/issues/9923#issuecomment-362537538, or mute the thread https://github.com/notifications/unsubscribe-auth/AAkAVw5GrJTaqwmWGer_319ugvBtIqQwks5tQtlmgaJpZM4R22uP .

Alys commented 6 years ago

Proof that it's an ongoing problem:

data captured at 18:58 :

{"_id":"0c850897-e6bb-4645-8457-2f00ccef7cc3","notifications":[
    {"type":"STREAK_ACHIEVEMENT","data":{},"id":"da429bc1-b84e-4571-9f9b-29f54ae94d94"},
    {"type":"CRON","data":{"mp":-2.700000000000003,"hp":0},"id":"3a1b943d-199c-41c6-85eb-1b223c00d880"},
    {"type":"NEW_CHAT_MESSAGE","seen":false,"data":{"group":{"name":"Guild Name","id":"869fda7a-0b84-4807-8306-52ab8ed787f2"}},"id":"78c07c67-8e52-4352-8cdf-8beac247cf86"}
]}

data captured at 19:13 for the same user:

{"_id":"0c850897-e6bb-4645-8457-2f00ccef7cc3","notifications":[
    {"type":"STREAK_ACHIEVEMENT","data":{},"id":"da429bc1-b84e-4571-9f9b-29f54ae94d94"},
    {"type":"CRON","data":{"mp":-2.700000000000003,"hp":0},"id":"3a1b943d-199c-41c6-85eb-1b223c00d880"},
    {"type":"NEW_CHAT_MESSAGE","seen":false,"data":{"group":{"name":"Guild Name","id":"869fda7a-0b84-4807-8306-52ab8ed787f2"}},"id":"78c07c67-8e52-4352-8cdf-8beac247cf86"},
    {"id":"ea026b1c-93a9-4997-9cb9-e6e29f7ab0a8","data":{"group":{"name":"Party Name","id":"80340ebc-e12a-468b-9923-b31c16ca217c"}},"seen":false,"type":"NEW_CHAT_MESSAGE"}
]}

data captured at 20:12 for the same user:

{"_id":"0c850897-e6bb-4645-8457-2f00ccef7cc3","notifications":[
    {"seen":true}
]}
Alys commented 6 years ago

So we need to urgently either find what's causing this and push a fix, or push a temporary code change that prevents notifications data from throwing validation errors.

Alys commented 6 years ago

I've searched through all our users and found less than two dozen affected. All had auth.timestamps.loggedin dates from the past couple of days.

I've fixed them all but I'll periodically re-check users who've logged in in the past day to pick up new occurrences.

Alys commented 6 years ago

One more has appeared (now fixed). Loggly shows that user had two "NotFound: Notification not found" errors around 2018-02-02 10:29 then "ValidationError: User validation failed" at 2018-02-02 10:31

Alys commented 6 years ago

A new example:

at 20:53:

{"_id":"e66a1ed6-f67e-4dc2-96fd-a85604d13877","notifications":[
    {"type":"CARD_RECEIVED","seen":false,"data":{"from":{"name":"A Profile Name"},"card":"nye"},"id":"12345678-1234-1234-1234-123451234567"},
    {"type":"CARD_RECEIVED","seen":false,"data":{"from":{"name":"A Profile Name"},"card":"congrats"},"id":"12345678-1234-1234-1234-123411234567"},
    {"type":"CRON","seen":false,"data":{"mp":11.400000000000006,"hp":-1},"id":"b63861ed-1b19-4bd3-999f-18a1fa469e76"}
]}

at 21:14:

{"_id":"e66a1ed6-f67e-4dc2-96fd-a85604d13877","notifications":[
    {"seen":true}
]}

The only Loggly errors for today are three ValidationError: User validation failed errors for these all around 2018-02-02 11:00:

I'm hoping these invalid {"seen":true} notifications aren't caused by the CRON notification being cleared. :( That could cause an increasing number of errors as the day goes on.

I'm about to try to replicate that.

Alys commented 6 years ago

After several attempts, I can't replicate the bug by triggering then clearing cron notifications, or clearing other notifications while a cron notification exists, so that's good. :)

Alys commented 6 years ago

This PR: https://github.com/HabitRPG/habitica/pull/9929 has temporarily adjusted the notification schema so that the missing fields in {"seen":true} notifications are no longer mandatory and hence validation errors are not thrown. This has removed the damaging effect on the users so I'm moving this down to a medium priority. We should still find and fix the root cause of course but I no longer see it as urgent. When it's fixed, the temporary changes in that PR can be reverted.

I'm marking this as "help wanted" in addition to "needs investigation" in case any contributors have ideas, but if the staff want to handle this themselves or disagree with my change to the priority, I have no objection if they adjust the labels again. :)

Alys commented 6 years ago

This commit was made about a day ago to prevent the invalid {"seen":true} notifications from causing user-visible error messages: https://github.com/HabitRPG/habitica/commit/4e01b14874c43496296da4d8beaba181e7c62ec5

Alys commented 6 years ago

In the past 21 hours, there have been 78 new instances of users having just {"seen":true} or null as a notification (that's found in database data, not reported by users).

@paglias has said there's a new variant too: "with {id, seen, data} but not type"

Alys commented 6 years ago

The null notifications are a critical error. They prevent you being able to use Habitica. The console log says Impossible to fetch user. Clean up localStorage and refresh. Loggly says:

httpCode:500
isHandledError:false
level:error
message:TypeError: Cannot read property 'toJSON' of null

I'm raising this to an important priority again until we have a fix or workaround for that. I'm going to check for the null notifications in all user accounts repeatedly today and fix them when I find them (just searching those who've logged in recently since it doesn't happen for inactive users).

Alys commented 6 years ago

Four users found with null notifications and fixed (one of them reported this in an email which is how I discovered this bug).

Alys commented 6 years ago

Another 6 users with null notifications.

Alys commented 6 years ago

I'm pretty sure the invalid {"seen":true} notifications are preventing the RYA screen from appearing in the Android app (and therefore preventing cron from running). When I set my lastCron date to the past and add one of those invalid notifications to my account, and then sync the Android app many times, RYA never appears. As soon as I remove the invalid notification and sync, RYA appears. I've repeated this twice with the same results. There's no errors in Loggly for my account from the time of that testing.

Because of this, I think we should start removing the invalid notifications from all affected user accounts a few times each day even though they're no longer causing visible errors in the website. I'll start doing that later today unless I hear otherwise from staff in the next hour or so (I need a break for breakfast anyway).

FYI @paglias if you're still around.

paglias commented 6 years ago

Writing a fix for this

paglias commented 6 years ago

Invalid notifications should be handled correctly after https://github.com/HabitRPG/habitica/commit/4efbbd7bac9e44a0792e50a7b0f92a882575df41 (and following commits to fix tests and typos)

Alys commented 6 years ago

I've fixed the notifications for about 130 users today. I've just done a final sweep now for today, and I'll do another one tomorrow morning (in about 8 hours) if the fix isn't live by then.

paglias commented 6 years ago

@Alys if we are not receiving reports anymore I'd like to lower the priority here. Will then check the numbers again at the end of next week to see if upgrading mongoose had any positive effect

Alys commented 6 years ago

Fine with me. I've changed it to medium.

Alys commented 6 years ago

@paglias Do you still want this marked as "help wanted" or do you want to keep handling it? Or is it on hold until we upgrade libraries?

paglias commented 6 years ago

I'm making it help wanted but at this point I'm pretty sure it's caused in a bug in a library, not in our code

paglias commented 6 years ago

The webhooks PR has a fix to make sure invalid notifications are removed from the user object, so I'm going to close this

fentie commented 6 years ago

I see this is closed... should this code now be removed?

/**
 * Convert notifications to JSON making sure to return only valid data.
 * Fix for https://github.com/HabitRPG/habitica/issues/9923#issuecomment-362869881
 * @TODO Remove once https://github.com/HabitRPG/habitica/issues/9923
 * is fixed
 */
schema.statics.convertNotificationsToSafeJson = function convertNotificationsToSafeJson (notifications) {
  if (!notifications) return notifications;

  return notifications.filter(n => {
    // Exclude notifications with a nullish value
    if (!n) return false;
    // Exclude notifications without an id or a type
    if (!n.id || !n.type) return false;
    return true;
  }).map(n => {
    return n.toJSON();
  });
};
paglias commented 6 years ago

The issue is closed as it doesn't result in errors anymore but it still happens to have corrupt notifications so that code should stay