Asynchronious scipting in sphere

lintax commented 9 years ago

We want to support advanced scripting, especially multi-threaded.

PROBLEM.

Current script engine is not bad, but event-driven and synchronous, so while processing a script event server is waiting the result, causing potential lags on heavy operations in a common places (like HitTry, etc).
Moreover, complex AI (especially monster groups AI) requires complicated scripting being heavy on execution. Same does with several 'clever' spells admin would like to implement.
Scripting language not bad but hand-crafted so it usually takes a quite a bunch of time to get with it right.

SOLUTION:

Add a script processor no way connected to the current script engine with configurable amount of threads and execution queries. Expose Sphere objects to the script using usertypes.
Use an game industry scripting standard - LUA that is a very high performance, standard and easy to learn language. This still leaves several questions, for example if async script will decide to execute "damage(victim, 10)" it should probably pass through the normal operational cycle including SCP event Damage to be triggered.

denizsokmen commented 9 years ago

I was actually thinking about integrating LUA but it seems to be a really huge work considering the old habits in the source.

coruja747 commented 9 years ago

honestly I think that UO clients are not prepared to multi threading, it send/receive packets using a single linear thread. Also a multi thread scripting on server-side probably will create more headaches instead make the code execution faster

just imagine a function like this:

[FUNCTION lol] TAG.A=1 DIALOG d_dialog_that_uses_this_tag

if for some reason the dialog got rendered before we set TAG.A on char, probably it wont work correctly because it will need TAG.A but the tag is not set yet

but maybe we can turn some server behaviors into multi thread, splitting some heavy functions into new threads. Like a new thread only to handle CItem::OnTick(), CChar::OnTick(), etc (these functions are heavy and it runs forever at each 0.1s), while the other thread will handle the network, main functions, etc. Also it's a good idea (or even priority?) turn the worldsave into multi-thread, using 2 CPU cores to save chars + 2 cores to save world items, etc

but these changes are not simple. Honestly I have no idea how to do this, it's just a idea if someone want go forward with it. This will be a pain to do, and also can create many problems, since many years ago sphere already had a frustrated try when changing some functions to multi-threading, making the server unstable and crashing a lot. Also currently we have a optional multi-thread network (NetworkThreads feature on sphere.ini) that when enabled I got a ping ~250ms, and disabled I got ping ~32ms. But if someone can really make an working engine, feel free to go forward :P

lintax commented 9 years ago

@coruja747 , have you read the whole suggestion or only first part) The things of multithreading ticks was partially done in .57. Yes, it has problems I do not wish to add to sphere. Btw ON=Tick cannot be multithreaded due to lots of scripts synchronizing several timers and tags/more1/etc with some data at the current time

if tag.a < 5 then
  if tag.a > 5 then
    SERV.log "we got multithreaded collision"
  endif
endif

This is a sample what can happen. Multithreaded scripts should be written this way from the beginning. I doubt that we ever wish adding transactional model with data hiding made from another threads since it is heavy out of our scope.

But again, my proposal is not that broking:

We do not touch current synchronous script engine/mechanics. Current scripts kept single-threaded and people can continue using spherescript the way it was for ages.
We add new scripting language (lua) with separate files and multithreaded processing working async way always.

Also, while discussing, new limit arrived:

Async operations from new script engine should NOT trigger events processing in scp files since this would lead to multithreaded scp execution that would cause unpredictable bugs.

fjgo86 commented 9 years ago

Well, having this working for saves, for example, after some work would reduce significantly the time spent on crowded servers, i've heard of servers needing almost 1m to complete the worldsave.

This could be done because in saves every script is stopped until it finish, so it won't be any problem with multithreading.

And if AI can be done, as you said it can, it will really improve the server's performance, there are a few triggers related to AI which will avoid doing a 100% LUA based AI, I know it's a little drastic to remove them but I'm sure someone can think of a nice way to bypass this problem like moving these triggers to fire before or after the LUA scripts. Anyway, AI is almost the most important issue to take care of because a few monsters active in the same sector will really eat the CPU.

lintax commented 9 years ago

Regarding the saves, aren't background saves already buildin? I remember old times of .56/.57 - my server was running with background saves that usually took about 20 minutes to save, but had almost no pauses during it (actually a small slowdown was happened on the end part of the save when every char was rechecked of the state being saved or no. I think that background saves should be the only option left for sphere and should work by default.

Regarding AI - anything that is not required synchronization could be make async. For example if you wish on=gethit to run some logics instantly - you have no option - use scp. But actually most scripts do not require instant actions (if not implementing custom fight engine/etc modifying incoming damage or cancelling it) you can instead write async scripts that will trigger soon, and the delay most probably will not be noticeable by a player.

jfmu commented 9 years ago

The thing is background saves have nothing to do with multithreading as far as i know (or some of you guys changed that already since i touched the code last around August). The principle is just to spread the computing/disk writing overhead over 20-40minutes (depending on sectorsize and stuff) so that you dont have these 1m complete stops. And background saves are not consistent in terms of time, but in terms of players and items. This is achieved by some nasty checks that ensure that items dont get duplicated by a player who was saved in timestep #1 and in the meantime gated to a dungeon at the end of the map which would be saved in timestep #N... Dont panic, they're safe at least in my experience.

The problem with multithreading and/or asynchronous scripts, or even starting a small asynchronous kernel is... There is no problem, as long as you dont touch stuff that gets touched by the sphere main engine thread in the meantime. And it's VERY hard to ensure that. You'll need something like locking or.. transactional memory and even with that you would need to implement rollback code.. I thought about this for some time now but i dont see any easy solution to it.

Scripting in sphere is, like you already said, event driven. I thought about maybe locking on a per-sector basis. And then go and tick sectors in parallel (Yes!!). But its difficult.

Imagine thread #1 ticks sector 1 which has a script which accesses an item in sector 2. At the same time: Sector 2 is already locked by thread #2 which is ticking it. A script is triggered by some event which tries to access an item inside sector 1 (which is locked by thread #1). Tadaa. sparkles. Deadlock.

This is the main problem with multithreading in Sphere i think. If you could resolve that particular situation, we should absolutely try parallel Sector-Ticking. I'm talking about Server::OnTick, do that stuff in parallel.

One solution i though about MIGHT be: If thread #1 notices it's about to access stuff inside sector 2 which is locked by another thread.. Then thread #1 should go into a sleeping mode and release EVERY lock it holds for every sector. This prevents the deadlock. Problem in this situation: Data MUST be consistent. I mean the following: Thread #1 was (just for example) calculating some extremely fancy and important number which is for some arbitrary reason just the sum of morex of two items, which happen to reside in two different sectors which get proccessed by thread ins parallel as described above; just imagine that. So thread #1 takes morex of item 1 and puts it into the sum. Then it wants to access the second item and sees that it's in a sector which is already beeing processed by another thread. So it goes to sleep, as mentioned. When it wakes up, will the sum (whereever it was storing the sum) still be the same as when it went to sleep?

Example:

Item1 morex: 20 Item2 morex: 22

Thread1: Sum = 0; Sum += 20; Access morex of item2; Oh fuck locked.. i have to sleep. Thread2: does some stuff.. Ends Thread3: Does some other stuff, If it's a wednesday and i'm wearing my blue shorts, add 1 to sum, End. Thread1: Wakes up. Takes morex of item2; sum += 22;

Is sum 42? It may not be.

Because that's the purpose of locking. You want your data to be consistent and your calculations to be deterministic. If we had a way to ensure that calculations or modifications to items by threads which have to take a pause in the middle of ticking a sector dont get tampered with by other threads, it's ok.

Or we could just say we dont care, or implement some concept which would enable users to ENSURE that their in-script calculations dont get tampered with while their thread may be sleeping.

Its difficult. No question :) But this is the only way i see.

coruja747 commented 9 years ago

yep that's the problem: it's too difficult and requires a lot of huge changes. I'm not saying that we can't or won't do it, but on past, sphere already tried many features like this. Most of these things got broken or incomplete for many years, and after 5/10/15 years these codes finally got removed because even after many years they still not working (and obsolete)

that's why I said it's better focus on small things instead try something bigger that probably will bring us many headaches instead improvements. Instead a huge change on the entire code, we can just turn worldsaves into multi-thread, or improve the current multi-thread network which doesn't work correctly, or even optimize some heavy functions like OnTick, etc (lightweight codes = more performance)

jfmu commented 9 years ago

I agree. In the initial comment from lintax i think he also means things like a database query or stuff that is completely independant from sphere's doing. This would be less of a problem. But if you wanted to do something like

This still leaves several questions, for example if async script will decide to execute "damage(victim, >10)" it should probably pass through the normal operational cycle including SCP event Damage to be >triggered.

lintax

As lintax already says.. But it's not optional. You NEED to ensure that access to sphere internal things is EXCLUSIVE for threads. Otherwise the emulators behaviour will become unpredictable.

When i remember correctly from my experiences with RunUO, this Server is also not multi-threaded. Yes i know - we have Network Threads. But still all the work is done by one Main Thread which runs the Game Logic & Scripts. It's the same at RunUO. RunUO's Timer Class is also not asynchronous but similar to what we do in Sphere (Register Timer Events in Table and process them in main loop).

When it comes to scripting i mainly had two tricks to get some 'performance'. Well.. its only a dirty trick but:

[itemdef i_usename_delayed]
id=i_memory
type=t_eq_script

on=@equip
tag.counter=1
trigger @timer

on=@timer
for x 1 100
// If we processed all items, stop
    if (<tag0.counter> > <NUMITEMS>)
        remove
        return 1
    endif
// code
    tag.counter=<eval <tag0.counter>+1>
endfor
timerd=0
return 1

I'm just splitting the processing of elements above multiple server Ticks. With Timerd=0 the stuff will be executed in the next tick again. Cooperative scheduling if you will, but this extremely improves the gameplay experience for all people on the server if you code exhausting stuff this way.

Maybe we should put some more effort in an interface to Files / SQL / Bash. We did a lot of work for DB processing etc. with SYSCMD. Bash scripts also execute on another core ;)

lintax commented 9 years ago

@jfmu, your solution regarding delayed calculation via SCP is good one, but still not enough. For example, I wish an AI to scan for all characters nearby, sort them by their equipment, stats, abilities, etc, and then ask all friends with enemies with priority at least twice as lower then the selected target attack him... Moreover, if total damage is too high, only part should be sent to attack this victim, the rest should select another target. Plus unit C can cast stone wall behind him to be not able to escape, and a stone wall before him will decay in 3 seconds so it safe to run forward instead of using builtin sphere pathfinding) This is a simple AI behavior, and possible to be implemented via tags on step-by-steps processing, but the complexity of the whole system being scripted raises exponentially with every next decision to pass (including bugs-prone).

But it's not optional. You NEED to ensure that access to sphere internal things is EXCLUSIVE for threads. Otherwise the emulators behaviour will become unpredictable. Actually not that badly. Most internal things are atomic (in the means that they update an object reference or a single object) and while playing with multithreading in .57 i had little problems with system stability. Most problems were happened in client packets preparation and sending (this part was not thread safe enough) and scripts by themselves (due to logics becoming unpredictable)

@coruja747 why do we need multithreaded save at all? Background saves works perfectly... Using separate thread is not safe by itself since we are saving property-by-property with no lock, so any transaction held by scripts or c++ code could be partially saved and partially not (since was not available at the time, for example, two connected-by-logics char tags). Moreover, we save to the same text file. so anyhow are limited to the time to actually write to disk)

Regarding optimization with OnTick transactions - this also in not possible without hurting current scripts - or - as jfmu suggested, implementing a transactional memory with data locks. First will ruin lots of custom scripts, second just makes no sense - we are not implementing an ACID object database by ourselves, right? Refactoring whole objects storage to use some nosql database is also not an option. The 3rd option - extra script engine with some limitations over execution injected specially for such async actions might be a solution. Surely, problems comes when it meets current script engine so probably it should NOT trigger it anyhow.

Sphereserver / Source

Asynchronious scipting in sphere #11