Dadido3 / D3bot

A very primitive AI for GMod bots designed to work with Jetboom's Zombie Survival gamemode.
38 stars 28 forks source link

Coroutine uploading #88

Open Bagellll opened 1 year ago

Bagellll commented 1 year ago

Today I learned 2 things.

  1. Do not convert a navmesh of gm_bigcity.
  2. When there are 10k+ nodes in a mesh, the net channel just dies trying to send it.

I was actually surprised that this works but it does actually prevent the (what I would consider edge case) bug where your mesh is too big, even for the last fix I did. Thanks again in advance.

Dadido3 commented 1 year ago

While it is a nice thing to have a coroutine to send the map data asynchronously, there is one flaw in the implementation:

UploadMapNavMesh will be called on every update of the navmesh, so this means that when there are two successive calls to UploadMapNavMesh it will cause a race condition:

  1. UploadMapNavMesh is called, the coroutine#1 and the think-hook#1 is created.
  2. Think-hook#1 is called, coroutine#1 is resumed. A block of navmesh data is sent as expected.
  3. Step 2 may repeat several times.
  4. UploadMapNavMesh is called again, a new coroutine#2 is created and the think-hook#1 is overwritten with think-hook#2 (a function that resumes coroutine#2).
  5. Think-hook#2 is called, coroutine#2 will be resumed. Now coroutine#2 will start to send a completely different navmesh.

This will definitely cause problems, as the receiving side doesn't know how to handle this scrambled navmesh data.

A simple solution for this race condition is to have a (single element) queue that UploadMapNavMesh writes into, and some think-hook that will start or restart a coroutine depending on whether there is some navmesh data queued or not.

Additional to this, errors are not handled correctly. So if the code in resume throws an error for whatever reason, the think hook will be called indefinitely.

I just pushed a commit that adds my async helper function. With this it should be relatively easy and clean to implement/fix the things written above:

local uploadMapNavPlayerQueue = {} -- Set of all players that need to have their client side navmesh updated.
function lib.UploadMapNavMesh(plOrPls)
  -- Add either a single player or a list of players to the queue.
  if type(plOrPls) == "table" then
    for _, pl in ipairs(plOrPls) do
      uploadMapNavPlayerQueue[pl] = true
    end
  else
    uploadMapNavPlayerQueue[plOrPls] = true
  end
end
local uploadMapNavPlayerWorker = {} -- Stores the state of the navmesh upload worker.
hook.Add("Think", "d3bot.MapNavMeshUploadWorker", function()
  -- Abort if there are no entries in the queue.
  if next(uploadMapNavPlayerQueue) == nil then return end

  -- Fetch players from the queue/set.
  local players = {}
  for pl, _ in pairs(uploadMapNavPlayerQueue) do
    table.append(players, pl)
  end
  uploadMapNavPlayerQueue = {} -- Clear queue.

  local running, err = D3bot.Async.Run(uploadMapNavPlayerWorker, function()
    local rawData = util.Compress(lib.MapNavMesh:Serialize()) or ""
    local dataLen = rawData:len()
    local maxChunkSize = 2^16 - 10 -- Leave 10 bytes for other stuff than the data.

    for i = 1, dataLen, maxChunkSize do
      local dataLeft = dataLen + 1 - i
      local chunkSize = math.min(maxChunkSize, dataLeft)
      local subDataComp = string.sub(rawData, i, i + chunkSize - 1)

      net.Start(lib.MapNavMeshNetworkStr, false)
      net.WriteBool(false)
      net.WriteUInt(chunkSize, 16)
      net.WriteData(subDataComp, chunkSize)
      net.Send(plOrPls)

      --print(chunkSize, dataLeft)

      coroutine.yield()
    end

    -- Finish the transfer.
    net.Start(lib.MapNavMeshNetworkStr, false)
    net.WriteBool(true)
    net.WriteUInt(0, 16)
    net.Send(players)
  end)

  if err then
    -- Worker failed, output error message.
    print(string.format("D3bot: Navmesh upload worker failed: %s", err))
  end

  if not running then
    -- Worker ended, restart it.
    uploadMapNavPlayerWorker = {}
  end
end

I haven't tested this code in any way, so it may contain syntax errors or other mistakes. But it should show how it can work in principle. It's basically just a worker that waits for a list of players that need their navmeshes updated. Additionally it will batch multiple update requests and their players while the map upload is in progress.

Bagellll commented 1 year ago

With my last commit I added a few improvements to the basic async lib, as well as fixed the race condition mentioned.

Workers are now an object for better managing state. I figured that something like this would be better for ambiguity throughout.

In addition to workers I also added a D3bot.Async.Defer() function for creating the coroutine inside a worker without resuming/starting it right away. Not using it here but it may come in handy.

While writing this reply I realized my pcall wrap was unnecessary and will remove it. fixed

Once again realizing while writing I forgot to make a proper queue for plOrPls. fixed


Thoughts on things thus far?

Dadido3 commented 1 year ago

This looks better, but there are still a few problems:

You are using uploadWorker:SetData(plOrPls), which overwrites the the plOrPls list of the worker. Which means that when a player subscribes to the navmesh updates, all other players will be removed from the navmesh update queue (at least for the next update). While this alone is only a minor issue, and may cause some players to not get any navmesh update, there is also the possibility that the plOrPls can be modified while the navmesh is being sent. There is now another race condition.

Also, as i planned to use ASYNC.Run in other parts of the bot, it's probably not a good idea to change the way it works. I already rely on the fact that it doesn't restart automatically, and that it doesn't need a specific "worker" type with getter and setter methods. Not all things i use ASYNC.Run for fit into that worker pattern, so it's better to leave it as it was.

Additionally i don't see any sense in starting and stopping the think hook. It is only run 60 times per second and hardly uses any resources. Sure, in my example it would create a new coroutine every time when there is no work, but that could easily be fixed by moving some code/checks outside of the async part of the worker. Right now the extra logic just makes the code harder to read (uses the global var uploading).

I don't want to be the guy that nitpicks on every small thing, and i appreciate the work you put in. But when core functionality is rewritten or fixed up it shouldn't make the code harder to maintain (complex state machines) and should be as flawless as possible.

Dadido3 commented 1 year ago

I just saw your fix for the upload queue. While it probably fixes the race condition and makes it a working queue, it does add more complexity. Also, it doesn't merge multiple players and instead causes the navmesh to be serialized multiple times.

It's probably the easiest to get my example code working. With the small improvement to move the -- Fetch players from the queue/set part above D3bot.Async.Run(uploadMapNavPlayerWorker, function() so it doesn't start a new coroutine when there is no need to.

Bagellll commented 1 year ago

Honestly you're right, editing async was probably a bit too outside this scope so I'll try to work around those constraints. My reason for insisting on the hook removal was because I'm very used to squeezing performance and I don't want to add any overhead, though reasonably small.

From my understanding, at this point.

Is this accurate? Just trying to knock as many out as a time.

Dadido3 commented 1 year ago

Cut worker model

Just revert ASYNC.Run back to its original state. It's already low overhead and is written in a way that allows LuaJIT to optimize it pretty well.

Possibly cut think removal (clarify)

You can let the think hook run forever, as it runs only 60 times per second and hardly does any work when there is nothing to do. I edited my example code to make it much more efficient when it idles.

You probably can't even easily measure the overhead that this hook produces.

Make the queue more efficient

My example uses a set of players as a "queue". UploadMapNavMesh will only add players to this set, and not remove any players. This means that when the navmesh worker starts it will send the newest navmesh data to all players that are in the set. It probably doesn't make much difference in speed, but it's also not more complex than the alternatives. So i don't see any reason not to do it this way.