Closed epaillous closed 1 year ago
Hi Emilie,
The wipe-crawler-data
command is part of the last release 1.9.0 so be sure to bump your deployment to this release. See the blog post here for the procedure to follow: https://changemetrics.io/posts/2023-09-17-version-1.9.0-release.html
However to replace an expired token, usually you just have to:
docker-compose stop crawler
.secrets
filedocker-compose start crawler
Hello thanks for your feedback! I just pulled the last version and everything works as expected ! Thank you! Does the default delay between crawler fetchs has changed ? I just had some data fetched for the last week, and now it stopped like :
2023-09-18 15:03:20 2023-09-18 13:03:19 INFO Macroscope.Worker:175: Posting documents {"index":"my_index","crawler":"github-my-crawler","stream":"Changes","count":159}
2023-09-18 15:03:29 2023-09-18 13:03:29 INFO Macroscope.Worker:188: Continuing on next entity {"index":"my_index","crawler":"github-my-crawler","stream":"Changes"}
2023-09-18 15:03:29 2023-09-18 13:03:29 INFO Macroscope.Worker:157: Looking for oldest entity {"index":"my_index","crawler":"github-my-crawler","stream":"Changes","offset":0}
2023-09-18 15:03:30 2023-09-18 13:03:29 INFO Macroscope.Worker:166: Crawling entities completed {"index":"my_index","crawler":"github-my-crawler","stream":"Changes","entity":{"contents":"myrepo","tag":"Project"},"age":"2023-09-18T12:59:10Z"}
2023-09-18 15:03:30 2023-09-18 13:03:29 INFO Macroscope.Main:184: Group end {"group":"https://api.github.com/graphql--XXXXXX for myrepo"}
No the delay did not changed. However there is a new setting for configuring it if you want to change the default: https://github.com/change-metrics/monocle/pull/1033
Hi @morucci, thanks for your quick answer! Not sure it works, I have added the loop_delay in my config.yaml like this :
workspaces:
- name: my-workspace
crawlers:
- name: github-my-crawler
provider:
github_organization: my-org
github_repositories:
- my-repo
update_since: '2023-04-01'
loop_delay_sec: 120
stop and start the crawler by doing :
docker-compose stop crawler
docker-compose start crawler
but it seems that the crawler is still fetching only every 600 seconds π’
Moreover, despite the update_since: '2023-04-01'
it seems that since I've regenerate my token it does not fetch old data, see this :
I should have data during july and august π’ Do you have any clue on what's happening ?
Hi,
Yes multiple things here:
The monocle CLI does not provide a command to reset the crawling point to the one defined in the configuration. I'll see if I can provide such command.
Until I have a fix, the only way is to wipe data related to that crawler using https://github.com/change-metrics/monocle#wipe-crawler-data-from-the-database
Also the loop_delay_sec
must be defined in a specific crawlers
key at the root of the YAML (same level than workspaces
)
https://github.com/change-metrics/monocle#crawlers-1
hi @epaillous
By using the latest
image you'll get access to a new command to reset the crawling commit date.
https://github.com/change-metrics/monocle#reset-the-crawler-commit-date
Hello there! First of all, thank you for this project : I've used it some months ago and was able to retrieve some data for my teams! However, I've stopped using it during 2 months, so my github token expired. I've regenerate new one, and try to replace it in the .secrets but it seems that the crawler can not fetch data anymore, here is the error I get :
I've tried to wipe all data from the crawler (even if I do not really want to do that because it took some times to retrieve it π ) but the command failed :
Any ideas ? π