apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.6k stars 524 forks source link

[Bug][Sync Frequency] Skipping data collection when there are multiple projects with the same schedule #8175

Open roberioguedelho opened 1 week ago

roberioguedelho commented 1 week ago

Search before asking

What happened

Data collection is not occurring on schedule for some projects when I have multiple projects with the same schedule. Captura de Tela 2024-10-31 às 09 38 28

What do you expect to happen

Data collection is triggered for all projects with the same schedule.

How to reproduce

The problem is occurring with 61 projects with the same schedule, I don't know the exact number of projects for the bug to occur, for a small amount it doesn't occur.

Anything else

The bug occurs in all schedules

Version

v1.0.0

Are you willing to submit PR?

Code of Conduct

dosubot[bot] commented 1 week ago

@roberioguedelho Olá! Como posso ajudar você hoje? 😊

The issue you're experiencing with data collection not occurring on schedule for multiple projects with the same schedule might be related to a known limitation in the system. The code uses an errgroup with a concurrency limit set by g.SetLimit(maxConcurrency), where maxConcurrency is defined as 10. This means that the system is designed to handle a maximum of 10 concurrent operations at a time, which could be affecting the scheduling of data collection for your 61 projects [1].

To address this, you might consider adjusting the concurrency limit if possible, or staggering the schedules to reduce the number of projects being processed simultaneously.

To continue talking to Dosu, mention @dosu.

roberioguedelho commented 1 week ago

@dosu Is there any environment variable to change the value of maxConcurrency?

dosubot[bot] commented 1 week ago

Yes, you can change the value of maxConcurrency by setting the MAX_CONCURRENCY environment variable. This allows you to adjust the concurrency limit to better suit your needs. Make sure to restart the application after making this change for it to take effect.

To continue talking to Dosu, mention @dosu.

klesh commented 1 week ago

Try the latest beta, It might be fixed but I can't recall which version it was.