Closed GoogleCodeExporter closed 9 years ago
Or we can store jobs outside of hardcode in some place like SQL, and not have
these limits at all. Which seems a better solution for the work involved. :)
Original comment by widdis@gmail.com
on 1 Jul 2011 at 10:22
Actually storing in SQL won't solve the limit on itering through the jobs list.
Also, not everyone has access to SQL, so the idea behind a SQL Jobs was to be
an ALTERNATIVE, not the main project line.
Meanwhile, I think Chime's solution, here, is cross-site compatible, and it's
already written for you, so you should be able to just make it a function and
replace instances of lcon() with it. :D
Original comment by kkragenb...@gmail.com
on 1 Jul 2011 at 10:34
Well, the hard part remains converting all of the existing areas that use DBREF
lists to do that. I think we were hoping for consensus so that could be done
in a way that will mesh with upstream goals. If not, there are other
alternatives.
The problem exists though, and as Kevin noted SQL is not a solution here.
In particular as sql job count increases beyond LBUF size, while you can use
LIMIT n,m in your query to page through the possibilities, that still involves
adding DBREFlist paging support to the rest of the code. An extended lcon
using con/next (as above) would be useful with that same change of
infrastructure.
This looked like the less invasive change, perverse and horrifying though large
strings of base64 may appear.
Original comment by lasht...@gmail.com
on 2 Jul 2011 at 2:27
Cross-platform keeps being mentioned but I'll note some of the original
assumptions:
1. 3999 buffer limit only applies to TinyMUSH and RhostMUSH. Penn and MUX
installs use 8000 buffer, doubling the list size and making this discussion
largely moot for those codebases.
2. 5-digit dbrefs. This applies only to very large games. (I suppose if we're
worried about more than 500 jobs in the system we're already dealing with very
large games.)
I would be willing to wager that the number of games running Tiny or Rhost with
a grid size over 10K and expectation of over 500 jobs in the system is very
small. And if there is one, adding in this patch to allow 1000 jobs (or 2000
on Penn or TinyMUX) is still limiting, just limiting at a higher level. If 500
is truly a limit I'd rather pursue other ways of solving it to make it
unlimited, rather than just double the current limit.
Original comment by widdis@gmail.com
on 2 Jul 2011 at 2:46
Well, one of the main Rhost testbeds (HM) has a 30k object database, and is
constantly pushing up against that 500 limit. Upping it to a thousand would
actually be a massive help for them.
If you can find a way to make it unlimited, that'd be ideal, but I can't think
of any that don't involve VERY slow SEARCHes.
Original comment by kkragenb...@gmail.com
on 2 Jul 2011 at 3:21
Paged result lists or directly iterating with con/next would do it. That's a
large change of design though.
Search()es are a show-stopper for games of that size.
A better long term fix might be to move most of the logic into an external
mushbot with pluggable storage backends (sql, sqlite, text files on disk, etc),
and do all of the processing outside the mush. That's different enough that it
really should be a separate project though, and doesn't look like a short term
fix.
Original comment by lasht...@gmail.com
on 2 Jul 2011 at 3:56
Create an LCON1, LCON2, LCON3, etc. attribute on the job database object, such
that each attribute is under buffer size and spills over into the next as
needed. (Pretty easy to do with con() and next() like the proposed code.)
Update these attributes every time a job is created or destroyed.
Then instead of using lcon() just iterate through lattr(lcon*) to fetch the
attribute and process it like you would the lcon list; one chunk at a time.
Works for everything except sorting... for that could have sorted versions of
the attributes pre-processed (everytime a sort condition is changed) or figure
a routine to sort on the fly. Still shouldn't be much more intensive than
packing/unpacking the dbref list.
Original comment by widdis@gmail.com
on 2 Jul 2011 at 4:06
I think I like the idea of paged result lists a lot better than attributes on
the job database. Attributes is kind-of an ugly hack to accomplish it, at
best. The paged results also mean that for 99% of our users, there won't be
any visible change. It assumes 'page 1' most of the time. It's just the big
games that will start using the paged resultset.
I definitely don't like the mushbot to external storage; that goes against the
core model for Anomaly and most MUSHcode projects that don't start from a
mushbot standpoint.
Original comment by kkragenb...@gmail.com
on 2 Jul 2011 at 5:42
Sorting could be done with a merge sort, but yes that's a more complicated
operation. Doable, though.
Do we want to update on each change, or update all of the lcon attrs only when
we need to look? Either way these seem like they'd need semaphore protection
to be completely safe.
Original comment by lasht...@gmail.com
on 2 Jul 2011 at 5:42
Or we can link the jobs to each other, like a linked list!
con() and next() already work for iterating lcon()-style.
All you'd need is a pointer to "first" and an attribute on each job for "next"
for the various sort options.
Original comment by widdis@gmail.com
on 2 Jul 2011 at 5:38
A MUSH/Rhost can always increase their jobs buffer size if they think they won't
get creeping data death.
Original comment by Fleety...@gmail.com
on 2 Jul 2011 at 10:47
RHost is stuck on 3999 char LBUFs while using GDBM. Other codebases don't have
compatible security models (and other stuff). It makes a thorny scaling
problem for large RHost games. It's under discussion and development, but more
flexibility here with con()/next() seems like the quicker path forward for
alleviating the current crisis.
There is no need to link jobs if we're using con()/next(); those do that for us
using the existing list structure in the game. con() is your "head" pointer
and next() is your "next" pointer. The problem there is that mushes have no
function-level arbitrary looping structure.
There are @trigger loops, there is iter() over fixed lists, and there is
recursion. @trigger loops involve too many queue cycles and dramatically
complicate safety as then semaphores would be critical. u() recursion is
unworkable because we'd need a stack thousands of levels deep or tail-recursion
support in the interpreter (hahaha). iter() over fixed lists (as above) will
have to do, and is indeed what I did above in the con()/next()-based
packedlcon() implementation. This means that if your content count goes over
your known max, you miss a few; in normal situations you execute too many
steps. As long as you can predict the maximum number of jobs, iter() works
okay. A hybrid iter()/u()-recursion system might also work.
Where this gets tricky is you can no longer hold the list of DBREFs in a value.
You can't lcon(), you can't use normal iter/filter/map/fold. Instead it's a
giant outer iter and lots of fiddling with setq. Ugly, but fast enough. All
job list handling code needs to adapt to that structure, though.
Original comment by lasht...@gmail.com
on 3 Jul 2011 at 3:38
Yes, con() and next() do the link structure for replacing lcon() but they do
not deal with sorting a list.
If you're doing a filter-and-sort type of thing you can build a dbref list from
a filtered pass through with con()/next(), and then sort the resulting list (if
it is within buffer). But that does not permit a sorted output list of jobs
exceeding buffer size.... then again we're talking about spamming a user with
500+ lines of job output and failure at this point may be desired.
It really seems like all this discussion is to support one single game; perhaps
there are other types of solutions that may work instead... perhaps all jobs of
a particular bucket type could be stuck in a different container, parallel jobs
installs (+ajobs and +bjobs) if there's a natural partition of content, etc.
Original comment by widdis@gmail.com
on 3 Jul 2011 at 5:49
Well you could jsut split jobs out of the central bucket and into their own
buckets, for storage, and that'd solve it, too. Heh.
I still like the pagination answer. Also, I really doubt HM is the only game
to have ever had this problem. In fact I know it isn't, since we had this
problem on DLAoM, too. Apparently I just run big games. 8)
Original comment by kkragenb...@gmail.com
on 3 Jul 2011 at 5:29
Splitting into buckets doesn't solve the general sorting problem, though it
works well enough for most practical cases. No one intentionally tries to look
at all jobs; it only happens at login when their bucket list is unconfigured.
Even so, /select pulls the lcon into a register before it tries more complex
filtering. We'd need to do at least bucket-level filtering first.
Bucket-selection can be done with children() rather than lcon().
And then it'd probably be best to just fail if someone tries to select things
that are so numerous as to fill an lbuf.
The @dolist lcon() are easy enough to fix. The words(lcon()) are easy enough
to fix. The selection functions are the main complexity.
Original comment by lasht...@gmail.com
on 3 Jul 2011 at 8:19
I wouldn't say that nobody tries to list all jobs; I frequently used the
+jobs/all command back in the day when administering my jobs board.
There is an easy, alternative solution to this issue if nobody is trying to
read them intentionally on your game. You can either hide a lot of your buckets
by +bucket/set <bucket>/HIDDEN=yes, or you could make an ACONNECT check for
general +jobs access, and if the player can access the system and they have no
JOBSH attribute, it gives them a generic short-list JOBSH, showing just the REQ
bucket, perhaps. It's not a full solution, no, but it the greasiest.
I have pagination code already worked up for a player list. The code is pretty
simple, but could easily be adapted to a +jobs purpose if we find we can't live
without it.
All games have this problem. It's one of the limitations of the form, and the
elegance in MU* coding, IMHO, comes from how we dance around those limitations
while still doing interesting and useful shiat. We can either accept those
limitations and sigh (this is, after all, not really a jobs problem per-se,
than it is a server problem), or we can try to slip around the ropes.
Insofar as the initial Anomaly Jobs design philosophy goes, and why pagination
was never included in earlier builds is because I never imagined any Game of
Epic Size would be interested in running the system, much less in epic fashion.
Those huge dinosaur games usually had other methods of dealing with things.
Anomaly (game, not Jobs) itself was small and sleek in its (albeit fascist)
database design, and never had to worry about these server ceilings. However,
as the Jobs system has itself aged into a dinosaur (I was commenting to widdis
the other day that it's almost on its 10th anniversary from initial public
release), I suppose I should expect more assault and battery on the system.
Can I ask what your Maximum Jobs configuration setting is?
A large number set on max-jobs, which, while pounding on the ceiling of the
server's limitations, also doesn't quite make sense from a security standpoint.
I can write an easy command loop that creates 1000 spurious +requests if I were
trying to attack a game and petulantly annoy staff. With a lower value in the
maximum jobs attribute, there's less cleanup from attackers.
This is not meant to sound a-holish, but perhaps your management practices
could use some re-evaluation. I always merged +typo jobs for later mass fixes.
I tucked long-term jobs into hidden buckets. I kept on top of my staff to
complete jobs; especially when the lists were getting long. 200 max-jobs was
our limit. I ran a separate system (the PCs required an IC reports system for
Starfleet paperwork) which could float issues back and forth should they be
required which kept their jobs out of our jobs. Anything that was longer-term
than the next 2-3 months was put into a third archive.
So, as you can see, I'm kind of on the fence about this issue. On one hand,
it's feasible and elegant to dance around server limitations. On the other
hand, good management practices that keep job numbers low prevent hitting the
ceilings.
The system acknowledges that there are indeed limitations in place, but the
limitations aren't entirely unreasonable, and some of them, like Max-Jobs, have
purpose in mind.
Original comment by Fleety...@gmail.com
on 4 Jul 2011 at 6:15
OK, so I've been pondering this.
The lcon() replacement above is an elegant trick. lcon() is actually used
sparingly in the code, mostly for things like updating all jobs (e.g.,
+jobs/catchup) and the +myjobs list. All other job lists are processed
through the jobs/select code, meaning that the changes only need to be made in
one place. (Well, many places, as all the &SEL_* and &SORTBY_* attributes will
have to unpack their argument, but that's not bad.)
It's not quite as simple in jobs/select, though because the logic uses a stack,
the %qS register containing an `-delimited list of space-delimited lists of
dbrefs. So doing +jobs/select foo and bar, where foo and bar both have >=500
jobs in them, would still bust 999. More complex logic lowers the effective
limit even further. (And even without the fix above, lowers it to
571/stacklevels). So, I'll have to rewrite the stack to use multiple
registers.
But when all's said and done, there's a clear and easy path to 999 (guaranteed,
possibly more) jobs. The path beyond that is not going to happen with a
volunteer coder. So that's the direction we'll move.
Original comment by widdis@gmail.com
on 22 Nov 2011 at 7:57
It appears TinyMUSH doesn't have pack() or equivalent. So I can:
1. Use the old jobs 5 FN_PACK and code an equivalent unpack for TinyMUSH.
2. Ignore it and just don't compress for TinyMUSH (preferred. it's still
compatible, just at a lower job limit... I can still strip the #).
I like 2.
Original comment by widdis@gmail.com
on 22 Nov 2011 at 6:35
Well, the good news is the changes worked.
The bad news is, the increase in function invocations is huge.
Rather than hardcoding the 999 I used BUFFER/3 to maximize it. On MUX (that's
8000 buffer/3=2666 jobs)
Just looking at the lcon() replacement itself that's 2666 repetitions of 8
function calls, or almost 22000.
There's more overhead with checking access to jobs, filtering for the display,
and all the other places where a dbref now requires unpacking. On the dev
site on MUX with only 4 jobs, I'm busting a 25000 FIL. If I added more jobs to
simulate a bigger game (that this is really for) it'd be even worse.
Bottom line:
Penn and MUX buffers of 8K already provide 1000+ jobs. Putting in this code
would actually limit them beyond their existing capacity.
TinyMUSH doesn't have the pack/unpack functions that permit doing this in any
reasonable manner.
This leaves Rhost as the only game with limitations. This might be an
appropriate patch that a large Rhost game might want to consider. Or they
might consider parallel job installations (+ajobs, +bjobs, etc.) or other ways
to work around this limit.
As it is, I'm reverting my changes and don't think this is going to go into the
main jobs code.
Original comment by widdis@gmail.com
on 22 Nov 2011 at 10:13
r410 implements an improved stack to avoid exceeding buffer during list
manipulation.
Keeping this job open to document the limitations and potential workarounds in
the README.RHOST.
Original comment by widdis@gmail.com
on 23 Nov 2011 at 12:38
Documented increased jobs vs. buffer strategies in r416.
Original comment by widdis@gmail.com
on 24 Nov 2011 at 10:55
Original issue reported on code.google.com by
lasht...@gmail.com
on 1 Jul 2011 at 11:41