Horizon is not finishing jobs

ItzMeJesse commented 5 years ago

Problem: Whats wrong? I changed my domain name and ever since my Horizon Queue is not finishing jobs.
Version Info: PHP Version, SeAT Version, Operating System etc. Ubuntu, seat versions included in the attachments

I'm like a complete idiot at linux and this is probably easily fixed but I don't know how.

cecf491c78526c89129d8288b5b94cf9 horizon interface queue

Kind regards Jesse

warlof commented 5 years ago

Hi,

You need to process a complete queue in less than 60min. All job queued for more than this time will be automatically removed/killed.

According to your average processing time (job per minute), I assume you are running on a server using hard drive. Due to the high I/O SeAT is requiring, you should consider switch to some hardware with either SSD or NVMe drives (ovh / so you start or Digital Ocean are providing such ones - but they're not alone).

You also can consider to increase the amount of Workers - but keep in mind that your Cpu must be unstressed as most as possible (the load average must be bellow the maximum of core your cpu has).

Also, you can tweak the default jobs scheduling in SeAT settings. By default, we're queueing jobs for corporations and characters every hour.

ItzMeJesse commented 5 years ago

This is odd though because this issue started occuring only after switching a domain name but the hardware remained the same and has been working for over a year now.

warlof commented 5 years ago

You can try to clear the queue and start from fresh using php artisan cache:clear command. But except if you have more things to share, I doubt the domain change anything 😉

Geabo commented 5 years ago

Same issue after updating SeAT to the latest version today:

High number in the queue (this never reached more than a few hundred before and would clear after a few minutes). Screenshot_1

I added additional workers in an attempt to get the queue down. Before the additional workers it was climbing to 14k+. Running on VPS with following: CPU: 2 vCore RAM:2 GB SSD: 80 GB Screenshot_2

Below is a screenshot of what I get from: docker-compose logs --tail 10 -f Screenshot_3

Not too savy on docker stuff, let me know what other information would be helpful.

Edit: in addition to updating to the latest version of SeAT, I also added fittings and calendar plugins. Screenshot_4 Screenshot_5

warlof commented 5 years ago

Hi @Geabo, when did you update for last ? (except today)

Geabo commented 5 years ago

When I first spun up the server back in February

Geabo commented 5 years ago

Update: the additional workers are catching up but a higher number of failed jobs than I'm used to seeing (usually a few per hour).

Screenshot_1

Edit: Nvm Screenshot_2

ItzMeJesse commented 5 years ago

At the domain change for me, I did update seat at the same moment. This date was I think on the 26th of May.

I'll find out the specs of my hardware VPS specs

Edit: Here it is ^

warlof commented 5 years ago

@ItzMeJesse with such specs, you can definitely bump pour worker amount to 12 or more and switch to auto balanced queue.

ItzMeJesse commented 5 years ago

Are these settings set by the horizon.php file or are there commands for this?

Geabo commented 5 years ago

@ItzMeJesse in /etc/supervisor/conf.d/ change the following:

process_name = %(programname)s%(process_num)02d ... numprocs=3

Then go to /var/www/seat/vendor/laravel/horizon/config/horizon.php and change the following at the bottom under both "environments" and "local" (not sure if both is necessary, but it worked for me):

'balance' => 'auto',

ItzMeJesse commented 5 years ago

@Geabo By default mine looks like this: Seat conf Never mind I am very stupid once again

Geabo commented 5 years ago

I'm not entirely sure about the auto balance. I just checked again and still have red "x"s Screenshot_1

ItzMeJesse commented 5 years ago

I think that killed my supervisor? I set the num procs to 12 and now my supervisor.sock has gone missing. It was supposed to be 3 aaaargh. @warlof @Geabo 2a00731d766018d3206e40133114993a

It's inactive & says I have 0 processes

Geabo commented 5 years ago

sudo supervisorctl update

sudo supervisorctl restart

Did you run the above commands after changing conf.d?

ItzMeJesse commented 5 years ago

Negative, I ran the supervisorctl restart then restarted the vps Now my supervisor is dead X(

warlof commented 5 years ago

https://eveseat.github.io/docs/configuration/env_file_reference/ just the last 2

ItzMeJesse commented 5 years ago

@warlof Those lines are not in my .env They end at SUPERVISOR_RPC_ADDRESS= SUPERVISOR_RPC_USERNAME= SUPERVISOR_RPC_PASSWORD=nottellingyou :) SUPERVISOR_RPC_PORT=9001 SUPERVISOR_GROUP=seat

I guess I have an older version of the .env maybe? I have pasted the lines in the .env

12 processes now running! Now to see if they fix my issue

ItzMeJesse commented 5 years ago

Ok, The jobs are no longer paused (for now). I will monitor the queue closely but they already give out a runtime so it's finishing them.

The issue was not enough workers to handle the load.

If it doesn't start being stupid within a couple hours I will close this topic. Thanks for the help!

Geabo commented 5 years ago

Did you get auto balance to work? That's bugging me now. I added the line in the .env for QUEUE_BALANCING_MODE=auto, but no effect.

Geabo commented 5 years ago

Also confirming the additional workers did eventually catch up: Screenshot_2

ItzMeJesse commented 5 years ago

cross

The balance still is a red cross so doubts there but it is working

EDIT: 00:03 Huge queuelines are back

leonjza commented 5 years ago

@ItzMeJesse in /etc/supervisor/conf.d/ change the following:

process_name = %(programname)s%(process_num)02d ... numprocs=3

Modifying your supervisor configuration like this is not the correct way to scale workers and may have unintended side effects (mostly negative). SeAT makes use of Horizon which manages the number of workers automatically, internally and relies on the single supervisor job for that.

That being said, we need to document this it seems, but the correct way to "increase workers" is to specify QUEUE_WORKERS in your .env file, and specifying a number. For example:

QUEUE_WORKERS=6

In code, the number of workers is read from here.

leonjza commented 5 years ago

A reference for .env variables may be found here: https://eveseat.github.io/docs/configuration/env_file_reference/

ItzMeJesse commented 5 years ago

Yeah, I bumped my workers to 16 and I get waves of 15k jobs at times and they burn away at it but is this how seat intends to work?

If that is the case this topic is closed. The problem was that my .env was outdated and didn't have all the lines of the reference.

Geabo commented 5 years ago

I still don't feel like this is a "fix" - more of a band-aid. This issue did not appear until the latest revision, and increasing the workers does not address the root issue of waves of jobs in the queue. Sometimes the workers can keep up, sometimes they can't. I usually see a wave (increase to 20k+) and by the time it burns back down to 10K-11K another wave hits and it starts over.

Screenshot_1

leonjza commented 5 years ago

Just an update, there are plans to revert some commits where funneling was introduced in the coming weekend.

Geabo commented 5 years ago

Ok, is there any info (logs, metrics, diagnostics, etc.) I can provide that would help?

jediefe commented 5 years ago

I'll just add some confirmation to the issue that is persistent since one of the last SeAT core updates. Thank you @leonjza for the feedback that this is going to be handled.

leonjza commented 5 years ago

eveapi 3.0.14 landed which should hopefully improve this situation. Update and let us know. Thanks @warlof for championing this.

jediefe commented 5 years ago

Seems to work as it ran overnight without problems, which wasn't the case before.

Geabo commented 5 years ago

Confirmed, after updating to eveapi 3.0.14 my queue of 11k jobs cleared in under 30 mins. ~~No more "waves" of jobs are hitting the system either.~~ Thanks for the quick fix and dedication gentlemen!

Edit: After monitoring the site for a few hours today, I'm still getting "waves" of jobs in the 10K range. Averaging about 20k jobs per hour.

Screenshot_1

Edit 2: It does eventually catch up with the 16 workers.

warlof commented 5 years ago

@Geabo what kind of jobs?

The funnel which was introduced in 3.0.12 has been removed and it's the only change which has been applied at jobs level

ItzMeJesse commented 5 years ago

I have not witnissed a wave of 10k jobs yet but I'm sure they're still there. 15k+ jobs per hour. These are various jobs & this has not occured since late May so there might be something else that is causing this other than the funneling.

jediefe commented 5 years ago

How many registered users does your SeAT instance have? 15k+ jobs is a common number if there is a certain number of users. In my instance I run over 30k jobs every hour due to the number of users, and that's how it's supposed to be.

warlof commented 5 years ago

All changes regarding jobs are listed here : https://github.com/eveseat/eveapi/commits/master

The only major change which have been introduced during May is the endpoint bump of character wallet journal.

ItzMeJesse commented 5 years ago

I have 300 users registered so that could be the thing why so many jobs are being ran instead of last year. However this wasn't a subtle buildup but suddenly running that many jobs in waves all at once out of nowhere

jediefe commented 5 years ago

It's probably working correctly now. ;-)

MoucceuWildfire commented 5 years ago

Nice update. Now my instance of SeAT finishes jobs within 25 mni approx instead of hour. 1300 chars registered. Thanks to developers.

eveseat / seat

Horizon is not finishing jobs #564