Open AlexPerkbox opened 1 week ago
Hey @AlexPerkbox! I'm here to assist you with any bugs, questions, or contributions. Let's work together to resolve your issue while we wait for a human maintainer to assist.
The issue you're experiencing with cronjob schedules not being respected in DevLake could be due to improper configuration of the synchronization policy. Specifically, the sync frequency or cron code might not be set correctly. Ensure that the sync frequency or cron code is correctly configured in the synchronization policy to enable recurring runs of the blueprints. Additionally, verify that the "Skip failed tasks" option is appropriately set to prevent interruptions in the data collection process [1][2][3][4].
There are also known performance issues with the devlake-1.0.0-beta11
version that could affect the execution of cronjobs in high-load scenarios. For instance, MySQL InnoDB might fail with the message "The total number of locks exceeds the lock table size" when dealing with large repositories or boards. To mitigate this, you can increase the innodb_buffer_pool_size
in your MySQL configuration [5][6][7].
Additionally, there have been similar issues reported and resolved in the past:
For further assistance, you can refer to the Apache DevLake Config UI tutorial, which provides detailed steps on setting up data connections, creating projects, and configuring synchronization policies. If issues persist, consider checking the troubleshooting section or creating a bug report on the GitHub repository with relevant details such as error messages and logs.
To continue talking to Dosu, mention @dosu.
@AlexPerkbox Thank you for your feedback.
@AlexPerkbox You can set environment PIPELINE_MAX_PARALLEL
a bigger number and see what will happen.
@d4x1 Thank you for your response.
As part of the initial setup, there were a few hundred blueprints created in the form of one blueprint = one gitlab project
. It was observed that a complete run to fetch all data from gitlab takes approximately 2 days so the best time to kick-off (schedule) all jobs is Friday afternoon. Bulk-setting cronConfig in the form of
curl --location --request PATCH "https://devlake.my.endpoint.com/api/blueprints/[1-500]" \
--header 'Content-Type: application/json' \
--data '{
"cronConfig": "10 15 * * 5"
}' | jq
updates all blueprints fine, but when the time comes there is nothing scheduled/created/pending in the pipelines tab.
P.s Increasing PIPELINE_MAX_PARALLEL
to 5 does not make a difference.
I suspect that the ECS is the cause of the problem. According to the document on https://aws.amazon.com/ecs/ , containers are fully managed by AWS, maybe it would put containers to sleep when it is not active(processing requests, or using any CPU time). I could not find any concrete proof at this point, but I might do it to save resources(money) if I were AWS.
I have been seeing this same kind of problem. I have quite a few pipelines, all configured to run daily at 01:00, I find only 5 of them seem to run. there are no obvious errors or reasons why the others do not execute. the 5 that do run don't seem to be consistent. I'm running inside kubernetes
We are also using k8s (EKS) as well, and all pipelines get executed as expected, we also have some k3s clusters for testing and other purposes, and they are all working fine 😂. The chance of this being fixed is slim unless someone who is having the problem could dig deep into the root.
@klesh if you could give me a couple of pointers at the code which is responsible for triggering these scheduled jobs etc. then I'd be happy to dig in, maybe add some daignostic logging to my local fork.
@wouldd Sure thing, we are using github.com/robfig/cron for scheduling jobs and here is the code calling the lib when creating/updating any blueprint, while the ReloadBlueprints function is called on the startup to load all blueprints from the db.
here is where the cronManager
gets initialized after migrations were done, nothing fancy.
Let me know if I missed anything.
ah, well now that I know what logs I should be looking for. what I see is this: time="2024-06-25 00:00:04" level=error msg=" [blueprint] created cron job failed on blueprint:[1][REDACTED-Blueprint]\n\tcaused by: lock tables timeout: [0xc002ec4fd8 0xc002ec4ff0 0xc002ec5008] (504)" time="2024-06-25 00:00:04" level=error msg=" [blueprint] run cron job failed on blueprint:[1][REDACTED-Blueprint]\n\tcaused by: lock tables timeout: [0xc002ec4fd8 0xc002ec4ff0 0xc002ec5008] (504)" time="2024-06-25 00:00:04" level=error msg=" [blueprint] created cron job failed on blueprint:[21][REDACTED-Blueprint]\n\tcaused by: lock tables timeout: [0xc002ec5068 0xc002ec5080 0xc002ec5098] (504)" time="2024-06-25 00:00:04" level=error msg=" [blueprint] run cron job failed on blueprint:[21][REDACTED-Blueprint]\n\tcaused by: lock tables timeout: [0xc002ec5068 0xc002ec5080 0xc002ec5098] (504)" time="2024-06-25 00:00:04" level=error msg=" [blueprint] created cron job failed on blueprint:[3][REDACTED-Blueprint]\n\tcaused by: lock tables timeout: [0xc002ef53f8 0xc002ef5410 0xc002ef5428] (504)" time="2024-06-25 00:00:04" level=error msg=" [blueprint] run cron job failed on blueprint:[3][REDACTED-Blueprint]\n\tcaused by: lock tables timeout: [0xc002ef53f8 0xc002ef5410 0xc002ef5428] (504)" time="2024-06-25 00:00:04" level=error msg=" [blueprint] created cron job failed on blueprint:[2][REDACTED-Blueprint]\n\tcaused by: lock tables timeout: [0xc002ef55f0 0xc002ef5608 0xc002ef5620] (504)" time="2024-06-25 00:00:04" level=error msg=" [blueprint] run cron job failed on blueprint:[2][REDACTED-Blueprint]\n\tcaused by: lock tables timeout: [0xc002ef55f0 0xc002ef5608 0xc002ef5620] (504)" time="2024-06-25 00:00:04" level=error msg=" [blueprint] created cron job failed on blueprint:[12][REDACTED-Blueprint]\n\tcaused by: lock tables timeout: [0xc002ec5728 0xc002ec5740 0xc002ec5758] (504)" time="2024-06-25 00:00:04" level=error msg=" [blueprint] run cron job failed on blueprint:[12][REDACTED-Blueprint]\n\tcaused by: lock tables timeout: [0xc002ec5728 0xc002ec5740 0xc002ec5758] (504)"
I have 25 blueprints total. the UI shows them all 'scheduled' to run at the next expected schedule, however clearly they mostly fail to actually create in the cron scheduler due to table locks.
I would confirm the same behaviour as @wouldd, here is some logs from AWS CloudWatch
2024-06-25T11:30:00.107+03:00
time="2024-06-25 08:30:00" level=error msg=" [blueprint] run cron job failed on blueprint:[495][<gitlab-project-slug>]
caused by: Error 1040: Too many connections (500)
Wraps: (2) Error 1040: Too many connections
Error types: (1) *hintdetail.withDetail (2) *mysql.MySQLError"
I would confirm the same behaviour as @wouldd, here is some logs from AWS CloudWatch
2024-06-25T11:30:00.107+03:00 time="2024-06-25 08:30:00" level=error msg=" [blueprint] run cron job failed on blueprint:[495][<gitlab-project-slug>] caused by: Error 1040: Too many connections (500) Wraps: (2) Error 1040: Too many connections Error types: (1) *hintdetail.withDetail (2) *mysql.MySQLError"
@AlexPerkbox You can check AWS's config, turn up the value of something like max_connection_count
.
@wouldd Aah, I see. Too many bps are trying to create pipelines simultaneously, causing some table lockers to timeout.
@wouldd Could you spread them to different times and see if the problem goes away?
@klesh I spred about half of them out last week and all those ran fine. I've now spread the remainder out. it's obviously a bit of a pain to have to do this manually. but certainly seems to work around the issue.
Search before asking
What happened
Cronjobs are no longer considered and blueprints do not run recurringly based on a schedule.
E.g. This blueprint is supposed to run daily at midnight (3:00AM my local time) but it did not on June 14, 17, and 19.![image](https://github.com/apache/incubator-devlake/assets/113916459/2201b8b4-c82d-4005-99db-bbb4f2e280e5)
{ "name": "DevOps-Demo-Blueprint", "projectName": "DevOps-Demo", "cronConfig": "0 0 * * *", "createdAt": "2024-06-12T09:50:01.797Z", "updatedAt": "2024-06-12T09:50:27.466Z" }
Other than that, there are some 400 odd blueprints and when changing cronConfig (using API) only a handful of them actually get triggered (although they return 200 OK).
Running
devlake-1.0.0-beta11
on AWS ECS/RDS and byog (bring your own grafana)What do you expect to happen
Whenever a cronjob is updated a pipeline should trigger at the specified date and time.
How to reproduce
Update cronConfig setting either via UI or API and monitor behaviour.
Anything else
No response
Version
devlake-1.0.0-beta11
Are you willing to submit PR?
Code of Conduct