change-metrics / monocle

Monocle helps teams and individual to better organize daily duties and to detect anomalies in the way changes are produced and reviewed.
https://changemetrics.io
GNU Affero General Public License v3.0
372 stars 58 forks source link

Collect error in the index #1097

Closed TristanCacqueray closed 8 months ago

TristanCacqueray commented 9 months ago

This PR enables collecting crawler errors so that the process does not stops when failing to process an entity.

related: #1093

TristanCacqueray commented 9 months ago

Note that I haven't tested this change with real data.

bigmadkev commented 9 months ago

Just ran this pull request locally against the LLVM project I'm working with. Can confirm that it posted 464 entries when it got stuck again.

Wonder if it should continue on through as it's not hit a no next page error? As a clean install will get stuck on this one.

TristanCacqueray commented 9 months ago

@bigmadkev Thank you for testing this change. Could you please confirm that the crawler is actually stuck on the error? I would expect the "last update" timestamp to be set past the offending PR so that the crawler should be able to resume.

Though as indicated in the comment above, this implementation is presently skipping all the event happening after the error, I'll propose a change to decode the pageinfo from the error when it is available.

bigmadkev commented 9 months ago

I can see that the next run picks up all the updated items on the next run and doesn't hit the errored one again.

But the issue is that anything updated before the erroring one isn't indexed at all so rther than getting 9k it's only getting 500 ish up to the erroring pull request.

TristanCacqueray commented 9 months ago

Thank you for confirming, so that's expected. It looks like from the crawler.log you shared that we are getting the FetchErrorProducedErrors which actually contains the desired response, and I have updated this PR to handle that case. Make sure to delete your index if you want to try again. If you rebuild the web interface, you should see a red bell on the top right with a new errors page that should display The additions count for this commit is unavailable.

@morucci I am not sure this change is great because other unexpected errors might skip a large amount of data (because the crawler now always sets the last updated timestamp). Perhaps we should be more conservative in the Worker.processStream and only skip the PartialResult variant. In anycase, I have refactored the LentilleError and GraphQLError to help with further improvements.

Anyway I'll get back to this after the holidays, have a good end of the year! Cheers :)

morucci commented 8 months ago

Well done ! Thanks !

@bigmadkev I was able to try this PR and indexed llvm/llvm-project.