change-metrics / monocle

Monocle helps teams and individual to better organize daily duties and to detect anomalies in the way changes are produced and reviewed.
https://changemetrics.io
GNU Affero General Public License v3.0
372 stars 58 forks source link

Github token expired situation #1060

Closed epaillous closed 1 year ago

epaillous commented 1 year ago

Hello there! First of all, thank you for this project : I've used it some months ago and was able to retrieve some data for my teams! However, I've stopped using it during 2 months, so my github token expired. I've regenerate new one, and try to replace it in the .secrets but it seems that the crawler can not fetch data anymore, here is the error I get :

 WARNING Lentille.GitHub.RateLimit:62: Unexpected error {"index":"my_index","crawler":"github-my-crawler","stream":"Changes","err":"\"FetchErrorNoResult\""}

I've tried to wipe all data from the crawler (even if I do not really want to do that because it took some times to retrieve it πŸ˜… ) but the command failed :

docker-compose run --rm --no-deps api monocle janitor wipe-crawler-data --elastic elastic:9200 --config /etc/monocle/config.yaml --workspace my-workspace --crawler-name my-crawler

Invalid argument `wipe-crawler-data'

Usage: monocle janitor COMMAND
  Maintain the database

Any ideas ? πŸ˜‡

morucci commented 1 year ago

Hi Emilie,

The wipe-crawler-data command is part of the last release 1.9.0 so be sure to bump your deployment to this release. See the blog post here for the procedure to follow: https://changemetrics.io/posts/2023-09-17-version-1.9.0-release.html

However to replace an expired token, usually you just have to:

epaillous commented 1 year ago

Hello thanks for your feedback! I just pulled the last version and everything works as expected ! Thank you! Does the default delay between crawler fetchs has changed ? I just had some data fetched for the last week, and now it stopped like :

2023-09-18 15:03:20 2023-09-18 13:03:19 INFO    Macroscope.Worker:175: Posting documents {"index":"my_index","crawler":"github-my-crawler","stream":"Changes","count":159}
2023-09-18 15:03:29 2023-09-18 13:03:29 INFO    Macroscope.Worker:188: Continuing on next entity {"index":"my_index","crawler":"github-my-crawler","stream":"Changes"}
2023-09-18 15:03:29 2023-09-18 13:03:29 INFO    Macroscope.Worker:157: Looking for oldest entity {"index":"my_index","crawler":"github-my-crawler","stream":"Changes","offset":0}
2023-09-18 15:03:30 2023-09-18 13:03:29 INFO    Macroscope.Worker:166: Crawling entities completed {"index":"my_index","crawler":"github-my-crawler","stream":"Changes","entity":{"contents":"myrepo","tag":"Project"},"age":"2023-09-18T12:59:10Z"}
2023-09-18 15:03:30 2023-09-18 13:03:29 INFO    Macroscope.Main:184: Group end {"group":"https://api.github.com/graphql--XXXXXX for myrepo"}
morucci commented 1 year ago

No the delay did not changed. However there is a new setting for configuring it if you want to change the default: https://github.com/change-metrics/monocle/pull/1033

epaillous commented 1 year ago

Hi @morucci, thanks for your quick answer! Not sure it works, I have added the loop_delay in my config.yaml like this :

workspaces:
  - name: my-workspace
    crawlers:
      - name: github-my-crawler
        provider:
          github_organization: my-org
          github_repositories:
            - my-repo
        update_since: '2023-04-01'
        loop_delay_sec: 120

stop and start the crawler by doing :

docker-compose stop crawler
docker-compose start crawler

but it seems that the crawler is still fetching only every 600 seconds 😒 Moreover, despite the update_since: '2023-04-01' it seems that since I've regenerate my token it does not fetch old data, see this :

Capture d’écran 2023-09-19 aΜ€ 11 39 00

I should have data during july and august 😒 Do you have any clue on what's happening ?

morucci commented 1 year ago

Hi,

Yes multiple things here:

The monocle CLI does not provide a command to reset the crawling point to the one defined in the configuration. I'll see if I can provide such command.

Until I have a fix, the only way is to wipe data related to that crawler using https://github.com/change-metrics/monocle#wipe-crawler-data-from-the-database

morucci commented 1 year ago

Also the loop_delay_sec must be defined in a specific crawlers key at the root of the YAML (same level than workspaces) https://github.com/change-metrics/monocle#crawlers-1

morucci commented 1 year ago

hi @epaillous

By using the latest image you'll get access to a new command to reset the crawling commit date. https://github.com/change-metrics/monocle#reset-the-crawler-commit-date