change-metrics / monocle

Monocle helps teams and individual to better organize daily duties and to detect anomalies in the way changes are produced and reviewed.
https://demo.changemetrics.io/
GNU Affero General Public License v3.0
362 stars 56 forks source link

Crawler stopped importing data #1112

Open leonid-deriv opened 4 months ago

leonid-deriv commented 4 months ago

I have noticed that a crawler stopped importing data. I see the following errors in the log

2024-01-31 14:30:00 INFO    Macroscope.Worker:183: Looking for oldest entity {"index":"demo","crawler":"xxxx-monocle-demo","stream":"TaskDatas","offset":0}
2024-01-31 14:30:00 INFO    Macroscope.Worker:199: Processing {"index":"demo","crawler":"xxxx-monocle-demo","stream":"Changes","entity":{"contents":"xxxx/yyyy","tag":"Project"},"age":"2023-11-29T01:41:16Z"}
2024-01-31 14:30:00 WARNING Lentille.GitHub.RateLimit:66: Repository not found. Will not retry. {"index":"demo","crawler":"xxxx-monocle-demo","stream":"Changes"}
2024-01-31 14:30:00 INFO    Lentille.GraphQL:232: Fetched from current page {"index":"demo","crawler":"xxxx-monocle-demo","stream":"Changes","count":0,"total":0,"pageInfo":{"endCursor":null,"hasNextPage":false,"totalCount":null},"ratelimit":null}
2024-01-31 14:30:00 WARNING Lentille.GraphQL:276: Fetched partial result {"index":"demo","crawler":"xxxx-monocle-demo","stream":"Changes","err":[{"locations":[{"column":7,"line":8}],"message":"Could not resolve to a Repository with the name 'xxxx/yyyy'.","path":["repository"],"type":"NOT_FOUND"}]}
2024-01-31 14:30:00 INFO    Macroscope.Worker:204: Posting documents {"index":"demo","crawler":"xxxx-monocle-demo","stream":"Changes","count":2}
2024-01-31 14:30:00 INFO    Macroscope.Worker:189: Unable to find entity to update {"index":"demo","crawler":"xxxx-monocle-demo","stream":"TaskDatas"}
2024-01-31 14:30:00 INFO    Macroscope.Worker:183: Looking for oldest entity {"index":"demo","crawler":"xxxx-monocle-demo","stream":"Changes","offset":0}
2024-01-31 14:30:00 WARNING Macroscope.Worker:167: Stream produced a fatal error {"index":"demo","crawler":"xxxx-monocle-demo","stream":"Changes","err":["2024-01-31T14:30:00.901152636Z",{"contents":["Unknown GetProjectPullRequests response: GetProjectPullRequests {rateLimit = Just (GetProjectPullRequestsRateLimit {used = 183, remaining = 4817, resetAt = DateTime \"2024-01-31T14:53:42Z\"}), repository = Nothing}"],"tag":"DecodeError"}]}

Actually, the repository which cannot be found does not exist. I thought it could be cached so I have restarted services but looks like still have the problem any suggestions?

leonid-deriv commented 4 months ago

Maybe this is a coincidence, but the data stopped being imported on the date when I ran the group update which you know failed.

leonid-deriv commented 4 months ago

One more comment. xxxx/yyyy - not sure why crawler is trying to get this repo. I did execute both REST and GraphQL requests and this repo is not returned by GitHub. This repo is not shown in Monocle WEB interface. And we have never had this repo.

leonid-deriv commented 4 months ago

Had to reindex :(

leonid-deriv commented 3 months ago

Looks like we have a similar case again. After some "event" it stops importing data :(. symptoms similar to what described before. What I remember that the repository the crawler complains about did not exist ... this time I also cannot find this report. Last time the only solution was to completely rebuild the index but I am afraid this is not a good option. Any idea how we can troubleshoot it. Here is another error message I see regularly in the log

2024-03-07 19:16:09 WARNING Macroscope.Worker:167: Stream produced a fatal error {"index":"xxxx","crawler":"xxx-monocle-xxxx","stream":"Changes","err":["2024-03-07T19:16:09.507202765Z",{"contents":["Unknown GetProjectPullRequests response: GetProjectPullRequests {rateLimit = Just (GetProjectPullRequestsRateLimit {used = 130, remaining = 4870, resetAt = DateTime \"2024-03-07T19:32:37Z\"}), repository = Nothing}"],"tag":"DecodeError"}]}
leonid-deriv commented 3 months ago

To me, taking into account that it refers to non existing repo, some internal cache? maybe corrupted. So maybe it is possible to clean it and then I can reset the date to re-scan data? I really do not want to re-index it again, plus now it happened for the second time so probably will happen again :(

leonid-deriv commented 3 months ago

another question about last date. Monocle crawlers keep track of the last date (commit date) when a successful document fetch happened. Where crawler stores this data.

morucci commented 3 months ago

Yes there is a cache. The CLI does not provide a way to clear such entries for no longer existing repositories. Perhaps then you could try to remove the related state object in the Elasticsearch DB https://github.com/change-metrics/monocle/blob/master/src/Monocle/Backend/Index.hs#L204

leonid-deriv commented 3 months ago

I am trying to find the index in the elastic where you store this metadata and cannot find it. it is not visible in Kibana - or I am doing something wrong

On Wed, Mar 13, 2024 at 5:32 PM Fabien Boucher @.***> wrote:

Yes there is a cache. The CLI does not provide a way to clear such entries for no longer existing repositories. Perhaps then you could try to remove the related state object in the Elasticsearch DB https://github.com/change-metrics/monocle/blob/master/src/Monocle/Backend/Index.hs#L204

— Reply to this email directly, view it on GitHub https://github.com/change-metrics/monocle/issues/1112#issuecomment-1994415955, or unsubscribe https://github.com/notifications/unsubscribe-auth/A433DJMU7EEMJQD2NYRXA23YYBIPFAVCNFSM6AAAAABCTFY7VSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJUGQYTKOJVGU . You are receiving this because you authored the thread.Message ID: @.***>

morucci commented 3 months ago

I think because that's an object without the usual date field. So you need to select the right parameter in the kibana index pattern creation.

leonid-deriv commented 3 months ago

but what is the index name? it is not the index where all workspace data is stored?

On Thu, Mar 14, 2024 at 2:12 PM Fabien Boucher @.***> wrote:

I think because that's an object without the usual date field. So you need to select the right parameter in the kibana index pattern creation.

— Reply to this email directly, view it on GitHub https://github.com/change-metrics/monocle/issues/1112#issuecomment-1997093456, or unsubscribe https://github.com/notifications/unsubscribe-auth/A433DJO44XGXJR6G4ZTLLSDYYFZ2DAVCNFSM6AAAAABCTFY7VSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGA4TGNBVGY . You are receiving this because you authored the thread.Message ID: @.***>

morucci commented 3 months ago

same index

leonid-deriv commented 3 months ago

Fabien, sorry for the trouble. But I cannot find the required information. What is the document type to cache crawler information?

On Thu, Mar 14, 2024 at 3:09 PM Fabien Boucher @.***> wrote:

same index

— Reply to this email directly, view it on GitHub https://github.com/change-metrics/monocle/issues/1112#issuecomment-1997193994, or unsubscribe https://github.com/notifications/unsubscribe-auth/A433DJNURLYV7QEXSB2D3R3YYGAOXAVCNFSM6AAAAABCTFY7VSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGE4TGOJZGQ . You are receiving this because you authored the thread.Message ID: @.***>

leonid-deriv commented 3 months ago

Was not very attentive looking at the index. removing a repo from Elastic looks like solved the problem. Should I register a bug for it?

Leonid

On Thu, Mar 14, 2024 at 11:09 AM Fabien Boucher @.***> wrote:

same index

— Reply to this email directly, view it on GitHub https://github.com/change-metrics/monocle/issues/1112#issuecomment-1997193994, or unsubscribe https://github.com/notifications/unsubscribe-auth/A433DJNURLYV7QEXSB2D3R3YYGAOXAVCNFSM6AAAAABCTFY7VSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGE4TGOJZGQ . You are receiving this because you authored the thread.Message ID: @.***>

morucci commented 3 months ago

Hi, thanks you to have confirmed this. We can just keep that issue for us to investigate the fact that the crawler stop when a no longer existing is still in the "cache" Such objects can stay in the cache but should not prevent the crawler to process the rest of the repo.

TristanCacqueray commented 3 months ago

Looking at it, it looks like:

I think we could:

Note that the comment above is not correct, it should says This is likely an error we *can* recover

leonid-deriv commented 3 months ago

thank you, the most important is to make this error "non-fatal" so a crawler continues running. And all your 3 points make sense.

On Sun, Mar 17, 2024 at 12:48 PM Tristan de Cacqueray < @.***> wrote:

Looking at it, it looks like:

I think we could:

Note that the comment above is not correct, it should says This is likely an error we can recover

— Reply to this email directly, view it on GitHub https://github.com/change-metrics/monocle/issues/1112#issuecomment-2002451611, or unsubscribe https://github.com/notifications/unsubscribe-auth/A433DJJ23J47PYIXZE37LZLYYWGKRAVCNFSM6AAAAABCTFY7VSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBSGQ2TCNRRGE . You are receiving this because you authored the thread.Message ID: @.***>

leonid-deriv commented 2 months ago

Any chance to fix this error? The problem is that we are dropping "old" repos and I have to manually remove it from cache every single time :(