fedora-infra / anitya

A cross-distribution upstream release monitoring project
https://release-monitoring.org
GNU General Public License v2.0
245 stars 104 forks source link

Checking frequency? #919

Open wavexx opened 4 years ago

wavexx commented 4 years ago

I apologize if this not the right place to ask question about the bot at release-monitor.org. How often the URLs are scanned for new releases? Are you aiming for hourly frequency?

I was debugging another issue and discovered by chance that I have roughly ~35 checks per day on a single project monitored by anitya. Traffic is not really a problem, but this particular project had a single release in the last 8 years ;). I expected a monitoring server to do some sort of automatic backoff.

Zlopez commented 4 years ago

The checks are currently done once a hour, there isn't any algorithm to check less periodically projects that didn't have release too often. But it could be a nice addition.

der-eismann commented 1 year ago

Are the checks still being done hourly @Zlopez? On release-monitoring.org I can see

Last check ended at (UTC) 2023-01-05 04:17:33 Total (194674): OK (184004) Err (1458) Rate (9114)

But some projects are checked at a later date (2023-01-05 07:46:14 (UTC)) and some are not checked automatically at all. Are there any rules defining how often a project is being checked?

Zlopez commented 1 year ago

There are plenty of rules and the check is done continuously now, there are 5 minutes between checks and every project has it's own one hour timer. So the queue for check is created like this: 1) Add projects from ratelimited backend (those that weren't checked in previous runs) when the ratelimit was reset (currently only GitHub projects) 2) Add projects that weren't check in last hour and are not archived

This is run in multiple threads to check as much projects as possible.

The time on the project could be different from the check time, if somebody did the check manually (new project, check done by administrator).

Regarding the thanos project, the Github backend could take days to check, because there are plenty of projects using this backend and it is rate limited. I recommend using any other backend if possible.

der-eismann commented 1 year ago

Thanks for the answer! I understand that GitHub is rate-limited, but at 5k requests/hour (or 5k "points" for GraphQL) minus the 10% you allocate for users it should still be possible to update them in a reasonable time, shouldn't it? Is the GraphQL query maybe "too expensive"?

Zlopez commented 1 year ago

I try to keep it as minimal as possible, but we have around 260k projects, not sure how much of them are GitHub projects, but let's say about 30% (probably more) of them. Even if one request has only few points, you can't check more than 2000-3000 in one go and need to wait one hour for ratelimit to reset. This means you can check around 72000 projects per day (24*3000). So it takes more than a day to check all of them.

wavexx commented 1 year ago

FIY rechecking the frequency now on my server, I do get about one per hour:

38.145.60.3 thregr.org:80 - [08/Jan/2023:00:38:16 +0100]
38.145.60.4 thregr.org:80 - [08/Jan/2023:00:59:32 +0100]
38.145.60.4 thregr.org:80 - [08/Jan/2023:03:11:15 +0100]
38.145.60.4 thregr.org:80 - [08/Jan/2023:05:17:43 +0100]
38.145.60.4 thregr.org:80 - [08/Jan/2023:07:15:10 +0100]
38.145.60.3 thregr.org:80 - [08/Jan/2023:07:42:45 +0100]
38.145.60.4 thregr.org:80 - [08/Jan/2023:09:14:19 +0100]
38.145.60.4 thregr.org:80 - [08/Jan/2023:11:12:12 +0100]
38.145.60.4 thregr.org:80 - [08/Jan/2023:13:10:35 +0100]
38.145.60.3 thregr.org:80 - [08/Jan/2023:14:56:10 +0100]
38.145.60.4 thregr.org:80 - [08/Jan/2023:15:08:58 +0100]
38.145.60.4 thregr.org:80 - [08/Jan/2023:17:06:40 +0100]
38.145.60.4 thregr.org:80 - [08/Jan/2023:19:03:48 +0100]
38.145.60.4 thregr.org:80 - [08/Jan/2023:21:01:03 +0100]
38.145.60.3 thregr.org:80 - [08/Jan/2023:22:11:43 +0100]
38.145.60.4 thregr.org:80 - [08/Jan/2023:22:58:58 +0100]
38.145.60.4 thregr.org:80 - [09/Jan/2023:00:55:52 +0100]
38.145.60.4 thregr.org:80 - [09/Jan/2023:02:52:51 +0100]
38.145.60.4 thregr.org:80 - [09/Jan/2023:04:50:58 +0100]
38.145.60.3 thregr.org:80 - [09/Jan/2023:05:35:14 +0100]
38.145.60.4 thregr.org:80 - [09/Jan/2023:06:48:33 +0100]
38.145.60.4 thregr.org:80 - [09/Jan/2023:08:47:01 +0100]
38.145.60.4 thregr.org:80 - [09/Jan/2023:10:44:20 +0100]
38.145.60.4 thregr.org:80 - [09/Jan/2023:12:42:23 +0100]
38.145.60.3 thregr.org:80 - [09/Jan/2023:13:06:20 +0100]
38.145.60.4 thregr.org:80 - [09/Jan/2023:14:40:34 +0100]

Across workers it seems that it might be less than 60min if you notice.

Another interesting tidbit is that all the requests have been returning a 301 to the new location for >2yrs, which the bot is following at each attempt. That's probably the reason why I was counting more than 1rq per hour.

Maybe updating the address (if the 301 redirects to a valid location) would also be a good idea.

Zlopez commented 1 year ago

@wavexx The project on release-moniroting.org could be updated by anyone. Feel free to change the URL.

Regarding the check frequency it is on the amount of projects to check in one run.

wavexx commented 1 year ago

It's not so much for the traffic/hits. If I was a user, how would I notice the upstream changed the url or was squatted by someone?

Zlopez commented 1 year ago

I don't think there is any way, if you don't watch the project closely.

wavexx commented 1 year ago

Isn't a 301 intended for exactly this purpose? Right now the release check is working, but as soon as I remove it the check will fail and I assume release-monitoring will not show that the URL has been actually updated 5+ years ago to a new location.

Also, I wasn't really able to login. I don't have a fedora account, and I'm not sure whether I can use github or gitlab as an openid provider.

Zlopez commented 1 year ago

Maybe it will be worth opening a ticket for Anitya to handle 301. Could you point me to the project? I will change it.

wavexx commented 1 year ago

https://release-monitoring.org/project/9897/ can be changed to https, or to fetch the release info from https://gitlab.com/wavexx/wmnd/ via gitlab instead.

Zlopez commented 1 year ago

@wavexx Updated to https :-)