Closed ItzMeJesse closed 5 years ago
Hi,
You need to process a complete queue in less than 60min. All job queued for more than this time will be automatically removed/killed.
According to your average processing time (job per minute), I assume you are running on a server using hard drive. Due to the high I/O SeAT is requiring, you should consider switch to some hardware with either SSD or NVMe drives (ovh / so you start or Digital Ocean are providing such ones - but they're not alone).
You also can consider to increase the amount of Workers - but keep in mind that your Cpu must be unstressed as most as possible (the load average must be bellow the maximum of core your cpu has).
Also, you can tweak the default jobs scheduling in SeAT settings. By default, we're queueing jobs for corporations and characters every hour.
This is odd though because this issue started occuring only after switching a domain name but the hardware remained the same and has been working for over a year now.
You can try to clear the queue and start from fresh using php artisan cache:clear
command. But except if you have more things to share, I doubt the domain change anything 😉
Same issue after updating SeAT to the latest version today:
High number in the queue (this never reached more than a few hundred before and would clear after a few minutes).
I added additional workers in an attempt to get the queue down. Before the additional workers it was climbing to 14k+. Running on VPS with following: CPU: 2 vCore RAM:2 GB SSD: 80 GB
Below is a screenshot of what I get from: docker-compose logs --tail 10 -f
Not too savy on docker stuff, let me know what other information would be helpful.
Edit: in addition to updating to the latest version of SeAT, I also added fittings and calendar plugins.
Hi @Geabo, when did you update for last ? (except today)
When I first spun up the server back in February
Update: the additional workers are catching up but a higher number of failed jobs than I'm used to seeing (usually a few per hour).
Edit: Nvm
At the domain change for me, I did update seat at the same moment. This date was I think on the 26th of May.
I'll find out the specs of my hardware
Edit: Here it is ^
@ItzMeJesse with such specs, you can definitely bump pour worker amount to 12 or more and switch to auto balanced queue.
Are these settings set by the horizon.php file or are there commands for this?
@ItzMeJesse in /etc/supervisor/conf.d/
change the following:
process_name = %(programname)s%(process_num)02d ... numprocs=3
Then go to /var/www/seat/vendor/laravel/horizon/config/horizon.php
and change the following at the bottom under both "environments" and "local" (not sure if both is necessary, but it worked for me):
'balance' => 'auto',
@Geabo By default mine looks like this: Never mind I am very stupid once again
I'm not entirely sure about the auto balance. I just checked again and still have red "x"s
I think that killed my supervisor? I set the num procs to 12 and now my supervisor.sock has gone missing. It was supposed to be 3 aaaargh. @warlof @Geabo
It's inactive & says I have 0 processes
sudo supervisorctl update
sudo supervisorctl restart
Did you run the above commands after changing conf.d?
Negative, I ran the supervisorctl restart then restarted the vps Now my supervisor is dead X(
https://eveseat.github.io/docs/configuration/env_file_reference/ just the last 2
@warlof Those lines are not in my .env They end at SUPERVISOR_RPC_ADDRESS= SUPERVISOR_RPC_USERNAME= SUPERVISOR_RPC_PASSWORD=nottellingyou :) SUPERVISOR_RPC_PORT=9001 SUPERVISOR_GROUP=seat
I guess I have an older version of the .env maybe? I have pasted the lines in the .env
12 processes now running! Now to see if they fix my issue
Ok, The jobs are no longer paused (for now). I will monitor the queue closely but they already give out a runtime so it's finishing them.
The issue was not enough workers to handle the load.
If it doesn't start being stupid within a couple hours I will close this topic. Thanks for the help!
Did you get auto balance to work? That's bugging me now. I added the line in the .env for QUEUE_BALANCING_MODE=auto
, but no effect.
Also confirming the additional workers did eventually catch up:
The balance still is a red cross so doubts there but it is working
EDIT: 00:03 Huge queuelines are back
@ItzMeJesse in
/etc/supervisor/conf.d/
change the following:process_name = %(programname)s%(process_num)02d ... numprocs=3
Modifying your supervisor configuration like this is not the correct way to scale workers and may have unintended side effects (mostly negative). SeAT makes use of Horizon which manages the number of workers automatically, internally and relies on the single supervisor job for that.
That being said, we need to document this it seems, but the correct way to "increase workers" is to specify QUEUE_WORKERS
in your .env
file, and specifying a number. For example:
QUEUE_WORKERS=6
In code, the number of workers is read from here.
A reference for .env
variables may be found here: https://eveseat.github.io/docs/configuration/env_file_reference/
Yeah, I bumped my workers to 16 and I get waves of 15k jobs at times and they burn away at it but is this how seat intends to work?
If that is the case this topic is closed. The problem was that my .env was outdated and didn't have all the lines of the reference.
I still don't feel like this is a "fix" - more of a band-aid. This issue did not appear until the latest revision, and increasing the workers does not address the root issue of waves of jobs in the queue. Sometimes the workers can keep up, sometimes they can't. I usually see a wave (increase to 20k+) and by the time it burns back down to 10K-11K another wave hits and it starts over.
Just an update, there are plans to revert some commits where funneling was introduced in the coming weekend.
Ok, is there any info (logs, metrics, diagnostics, etc.) I can provide that would help?
I'll just add some confirmation to the issue that is persistent since one of the last SeAT core updates. Thank you @leonjza for the feedback that this is going to be handled.
eveapi 3.0.14 landed which should hopefully improve this situation. Update and let us know. Thanks @warlof for championing this.
Seems to work as it ran overnight without problems, which wasn't the case before.
Confirmed, after updating to eveapi 3.0.14 my queue of 11k jobs cleared in under 30 mins. No more "waves" of jobs are hitting the system either. Thanks for the quick fix and dedication gentlemen!
Edit: After monitoring the site for a few hours today, I'm still getting "waves" of jobs in the 10K range. Averaging about 20k jobs per hour.
Edit 2: It does eventually catch up with the 16 workers.
@Geabo what kind of jobs?
The funnel which was introduced in 3.0.12 has been removed and it's the only change which has been applied at jobs level
I have not witnissed a wave of 10k jobs yet but I'm sure they're still there. 15k+ jobs per hour. These are various jobs & this has not occured since late May so there might be something else that is causing this other than the funneling.
How many registered users does your SeAT instance have? 15k+ jobs is a common number if there is a certain number of users. In my instance I run over 30k jobs every hour due to the number of users, and that's how it's supposed to be.
All changes regarding jobs are listed here : https://github.com/eveseat/eveapi/commits/master
The only major change which have been introduced during May is the endpoint bump of character wallet journal.
I have 300 users registered so that could be the thing why so many jobs are being ran instead of last year. However this wasn't a subtle buildup but suddenly running that many jobs in waves all at once out of nowhere
It's probably working correctly now. ;-)
Nice update. Now my instance of SeAT finishes jobs within 25 mni approx instead of hour. 1300 chars registered. Thanks to developers.
Problem: Whats wrong? I changed my domain name and ever since my Horizon Queue is not finishing jobs.
Version Info: PHP Version, SeAT Version, Operating System etc. Ubuntu, seat versions included in the attachments
I'm like a complete idiot at linux and this is probably easily fixed but I don't know how.
Kind regards Jesse