Open dsotirho-ucsc opened 1 year ago
Spike to find message indicative of a successful migration. The most recent failed migration occurred before we started exporting GitLab logs to CloudWatch so we won't find an example of that. What we need here is supporting evidence for someone to craft a filter. We either need to be able to predict what a message for a failed migration would look like, or use a combination of filters. For example, if we knew the message for a successful migration was Migration status: ok
then the filter would look for messages where the string Migration status:
is not followed by the string ok
.
This might help, from a recent update to GitLab prod
.
filter @logStream = '/mnt/gitlab/logs/gitlab-rails/migrations.log'
| sort time desc
| limit 100
| fields time, message, class, current_iteration, severity
[
{
"time": "2023-06-21T21:28:40.710Z",
"message": "Migration finished",
"class": "AddTextLimitOnOrganizationName",
"current_iteration": "1",
"severity": "INFO"
},
{
"time": "2023-06-21T21:28:40.694Z",
"message": "Lock timeout is set",
"class": "AddTextLimitOnOrganizationName",
"current_iteration": "1",
"severity": "INFO"
},
{
"time": "2023-06-21T21:28:40.127Z",
"message": "Migration finished",
"class": "RemoveCiTriggersRefColumn",
"current_iteration": "1",
"severity": "INFO"
},
{
"time": "2023-06-21T21:28:40.115Z",
"message": "Lock timeout is set",
"class": "RemoveCiTriggersRefColumn",
"current_iteration": "1",
"severity": "INFO"
},
{
"time": "2023-06-21T21:28:39.984Z",
"message": "Migration finished",
"class": "DropUnusedSequenceByRecreatingVsaTable",
"current_iteration": "1",
"severity": "INFO"
},
{
"time": "2023-06-21T21:28:39.963Z",
"message": "Lock timeout is set",
"class": "DropUnusedSequenceByRecreatingVsaTable",
"current_iteration": "1",
"severity": "INFO"
}
]
… there were 70 records matching this Insights query.
@hannes-ucsc: "We might be able to find the exact message that indicates failure by looking at what other people have posted on the Interwebs about the failed migrations on GitLab. I'll do that."
Could not find any examples of log entries for failed migrations in migrations.log
. We'll have to wait for a migration to actually fail to see what we could search for. For now, we'll stick with the current manual process of looking for failed migrations.
@hannes-ucsc: "I'll be continuing to look for failed migrations as part of the GitLab update PRs. When I find a failure I'll will put this issue back to Triage for further investigation."
Based on Abraham's idea of using the CloudWatch logs to identify GitLab migration errors.