Lux-AI-Challenge / Lux-Design-S1

Home to the design and engine of the @Lux-AI-Challenge Season 1, hosted on @kaggle
https://lux-ai.org/
Apache License 2.0
898 stars 151 forks source link

Can't directly run cpp files in lux-ai-2021 (ETXTBSY error) #71

Open Desperationis opened 3 years ago

Desperationis commented 3 years ago

So I've heard, you can run cpp files directly into lux-ai-2021. However, when I try to run it with my own bot, I get this error. https://pastebin.com/CQDev0Kw error in its entirety.

diego@adhoc:~/Desktop/LuxBot$ sudo lux-ai-2021 src/main.cpp src/main.cpp 

-=-=-=-=-=-=-=-=-=-=-=-| [INFO] match_QiUDBKVEPU7A |-=-=-=-=-=-=-=-=-=-=-=-
[INFO] (match_QiUDBKVEPU7A) - Design: lux_ai_2021 | Initializing match - ID: QiUDBKVEPU7A, Name: match_QiUDBKVEPU7A
Error: spawn ETXTBSY
    at ChildProcess.spawn (internal/child_process.js:403:11)
    at spawn (child_process.js:580:9)
    at Object.spawnWithSignal [as spawn] (child_process.js:717:17)
    at Object.spawn [as default] (/usr/local/lib/node_modules/@lux-ai/2021-challenge/node_modules/cross-spawn/index.js:12:24)
    at /usr/local/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/Agent/index.js:653:46
    at new Promise (<anonymous>)
    at Agent._spawnProcess (/usr/local/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/Agent/index.js:646:16)
    at Agent.<anonymous> (/usr/local/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/Agent/index.js:628:56)
    at step (/usr/local/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/Agent/index.js:46:23)
    at Object.next (/usr/local/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/Agent/index.js:27:53) {
  errno: -26,
  code: 'ETXTBSY',
  syscall: 'spawn'
}

I get this error when trying to run the v1.1.x branch of the cpp kit as well; Not sure what is causing it. It works fine when transpiled into JS. I've tried running the lux-ai-2021 command as sudo and not sudo, though it doesn't make a difference. Here's the exact main.cpp code I tried to run in simple/:

#include "lux/kit.hpp"
#include "lux/define.cpp"
#include <string.h>
#include <vector>
#include <set>
#include <stdio.h>

using namespace std;
using namespace lux;
int main()
{
  kit::Agent gameState = kit::Agent();
  gameState.initialize();

  while (true)
  {
    gameState.update();

    vector<string> actions = vector<string>();
    Player player = gameState.players[gameState.id];
    Player opponent = gameState.players[(gameState.id + 1) % 2];

    GameMap gameMap = gameState.map;

    vector<Cell *> resourceTiles = vector<Cell *>();
    for (int y = 0; y < gameMap.height; y++)
    {
      for (int x = 0; x < gameMap.width; x++)
      {
        Cell *cell = gameMap.getCell(x, y);
        if (cell->hasResource())
        {
          resourceTiles.push_back(cell);
        }
      }
    }

    int citiesToBuild = 0;
    for (auto it : player.cities)
    {
      City *city = it.second;
      if (city->fuel > city->getLightUpkeep() * (int) GAME_CONSTANTS["PARAMETERS"]["NIGHT_LENGTH"] + 1000)
      {
        citiesToBuild += 1;
      }
      for (auto citytile : city->citytiles)
      {
        if (citytile->canAct()) {
          // you can use the following to get the citytile to research or build a worker
          // actions.push_back(citytile.research());
          // actions.push_back(citytile.buildWorker());
        }
      }
    }

    for (int i = 0; i < player.units.size(); i++)
    {
      Unit unit = player.units[i];
      if (unit.isWorker() && unit.canAct())
      {
        if (unit.getCargoSpaceLeft() > 0)
        {
          // if the unit is a worker and we have space in cargo, lets find the nearest resource tile and try to mine it
          Cell *closestResourceTile;
          float closestDist = 9999999;
          for (auto it = resourceTiles.begin(); it != resourceTiles.end(); it++)
          {
            auto cell = *it;
            if (cell->resource.type == ResourceType::coal && !player.researchedCoal()) continue;
            if (cell->resource.type == ResourceType::uranium && !player.researchedUranium()) continue;
            float dist = cell->pos.distanceTo(unit.pos);
            if (dist < closestDist)
            {
              closestDist = dist;
              closestResourceTile = cell;
            }
          }
          if (closestResourceTile != nullptr)
          {
            auto dir = unit.pos.directionTo(closestResourceTile->pos);
            actions.push_back(unit.move(dir));
          }
        }
        else
        {
          if (player.cities.size() > 0)
          {
            auto city_iter = player.cities.begin();
            auto city = city_iter->second;

            float closestDist = 999999;
            CityTile *closestCityTile;
            for (auto citytile : city->citytiles)
            {
              float dist = citytile->pos.distanceTo(unit.pos);
              if (dist < closestDist)
              {
                closestCityTile = citytile;
                closestDist = dist;
              }
            }
            if (closestCityTile != nullptr)
            {
              auto dir = unit.pos.directionTo(closestCityTile->pos);

              if (citiesToBuild > 0 && unit.pos.isAdjacent(closestCityTile->pos) && unit.canBuild(gameMap))
              {
                actions.push_back(unit.buildCity());
              }
              else
              {
                actions.push_back(unit.move(dir));
              }
            }
          }
        }
      }
    }

    for (int i = 0; i < actions.size(); i++)
    {
      if (i != 0)
        cout << ",";
      cout << actions[i];
    }
    cout << endl;
    gameState.end_turn();
  }
  return 0;
}
Desperationis commented 3 years ago

Replays show nothing special. When run in the same seed as their JS counterparts, they result in the exact same behavior.

Desperationis commented 3 years ago

I've attached the simple bot I've been trying to compile below. The exact command I used was sudo lux-ai-2021 main.cpp main.cpp simple.zip

I should mention that I'm on Linux Mint btw, so this error might not appear on Windows

Desperationis commented 3 years ago

After rigorous testing, I am more confused than I was before. I spammed that command on a specific seed for a single bot and it still failed every so often, meaning the chance of this issue occurring is completely random and not dependent on seed or the bot run. Wth

StoneT2000 commented 3 years ago

The ETXTBSY error is unrelated to the game engine and is something else. Works fine on my machine. Please stay tuned to this thread, may ask you to test some things.

StoneT2000 commented 3 years ago

Just making sure this isn't stale, is this still and issue @Desperationis ?

Desperationis commented 3 years ago

It's still a issue, not sure what's causing it. What I know for certain though is that running the lux-ai command multiple times on the same seed will eventually run the match

StoneT2000 commented 3 years ago

Oh ok so this error means you are trying to modify an executable while it is already running which is a weird error.

StoneT2000 commented 3 years ago

Do you by any chance have the executable file opened for some reason.

Additionally can you compile your C++ as you would and then execute from the command line yourself and not via lux ai (it should just hang)

Desperationis commented 3 years ago

No, I don't. In fact, even if I run the executable compiled with g++ wiht lux-ai, it has no effect on the behaviour Screenshot from 2021-07-16 09-13-57

StoneT2000 commented 3 years ago

@Desperationis Ok so this issue is not even consistently happening either, hard to reproduce, which leads me to think there's something up with your setup.

Can you tell me what node and npm version you are on and what linux mint version you are on?

I think worst case, use docker to run matches (we can provide a simple script for this) but hopefully that won't be necessary.

Desperationis commented 3 years ago

I'm running node v14.17.2 and npm v7.19.1 on Linux Mint Cinnamon Uma, though this happened on Ulyssa as well.

StoneT2000 commented 3 years ago

still not sure sorry.

I found this thread where someone else was using a nodejs application for something else and got the same error: https://github.com/alixaxel/chrome-aws-lambda/issues/69

Still not exactly sure what's going on. Maybe permissions? In which case can you do ls -la in the folder with the main.cpp file?

Desperationis commented 3 years ago

This is before running lux with main.cpp:

total 44
drwx------  4 diego diego 4096 Jul 23 20:30 .
drwxr-xr-x 16 diego diego 4096 Jul 24 12:46 ..
-rwxr-xr-x  1 diego diego  105 Jul 23 20:30 compile.bat
-rwxr-xr-x  1 diego diego  118 Jul 23 20:30 compile.sh
drwx------  3 diego diego 4096 Jul 23 20:30 internals
drwx------  3 diego diego 4096 Jul 23 20:30 lux
-rw-rw-r--  1 diego diego 4441 Jul 23 20:30 main.cpp
-rw-rw-r--  1 diego diego 1928 Jul 23 20:30 main.py
-rw-rw-r--  1 diego diego  265 Jul 23 20:30 package.json
-rw-rw-r--  1 diego diego  383 Jul 23 20:30 package-lock.json

After a successful run right after:

total 212
drwx------  6 diego diego   4096 Jul 24 12:48 .
drwxr-xr-x 16 diego diego   4096 Jul 24 12:46 ..
-rwxr-xr-x  1 diego diego    105 Jul 23 20:30 compile.bat
-rwxr-xr-x  1 diego diego    118 Jul 23 20:30 compile.sh
drwxrwxr-x  2 diego diego   4096 Jul 24 12:48 errorlogs
drwx------  3 diego diego   4096 Jul 23 20:30 internals
drwx------  3 diego diego   4096 Jul 23 20:30 lux
-rw-rw-r--  1 diego diego   4441 Jul 23 20:30 main.cpp
-rwxrwxr-x  1 diego diego 162032 Jul 24 12:48 main.out
-rw-rw-r--  1 diego diego   1928 Jul 23 20:30 main.py
-rw-rw-r--  1 diego diego    265 Jul 23 20:30 package.json
-rw-rw-r--  1 diego diego    383 Jul 23 20:30 package-lock.json
drwxrwxr-x  2 diego diego   4096 Jul 24 12:48 replays

After a bad run with the EXTBSY error:

total 212
drwx------  6 diego diego   4096 Jul 24 12:49 .
drwxr-xr-x 16 diego diego   4096 Jul 24 12:46 ..
-rwxr-xr-x  1 diego diego    105 Jul 23 20:30 compile.bat
-rwxr-xr-x  1 diego diego    118 Jul 23 20:30 compile.sh
drwxrwxr-x  2 diego diego   4096 Jul 24 12:49 errorlogs
drwx------  3 diego diego   4096 Jul 23 20:30 internals
drwx------  3 diego diego   4096 Jul 23 20:30 lux
-rw-rw-r--  1 diego diego   4441 Jul 23 20:30 main.cpp
-rwxrwxr-x  1 diego diego 162032 Jul 24 12:49 main.out
-rw-rw-r--  1 diego diego   1928 Jul 23 20:30 main.py
-rw-rw-r--  1 diego diego    265 Jul 23 20:30 package.json
-rw-rw-r--  1 diego diego    383 Jul 23 20:30 package-lock.json
drwxrwxr-x  2 diego diego   4096 Jul 24 12:48 replays

As a test, I ran sudo chmod +x on main.py, main.cpp, and main.out and still got the EXTBSY error. I did the same thing to only main.py and main.cpp on a fresh run of simple and got the same result.

StoneT2000 commented 3 years ago

Hi @Desperationis can you try this script as a test. Put this script into the same directory you call lux-ai-2021 from. Replace line 2 with the correct CWD. (so remove 'path/to/dir/of/main.out').

const { spawn } = require('child_process');
const p = spawn('./main.out', {cwd: 'path/to/dir/of/main.out');

p.stdout.on('data', (data) => {
  console.log(`stdout: ${data}`);
});

p.stderr.on('data', (data) => {
  console.error(`stderr: ${data}`);
});

p.on('close', (code) => {
  console.log(`child process exited with code ${code}`);
});

let me know if the EXTBSY error pops up.

Desperationis commented 3 years ago

I tried the JS script with node and the program just hung; It didn't produce any output on stdout or stderr, and it was like if you just ran main.out raw.

Btw you missed a } on line 2 of the code.

Desperationis commented 3 years ago

But yeah, no EXTBSY error. Idk if this piece of information is useful or anything but running main.out raw from the command line never produces the error, though it doesn't produce any output either.

StoneT2000 commented 3 years ago

ok this is helpful. It's supposed to just hang as the agent is waiting for input (match information and state) and didn't quit with a EXTBSY error. Small chance the bug may be caused by the cross-spawn package. Basically normally this is how we open your bot as shown in the stack trace

Error: spawn ETXTBSY
    at ChildProcess.spawn (internal/child_process.js:403:11)
    at spawn (child_process.js:580:9)

that just called the same command.

StoneT2000 commented 3 years ago

A safe solution is to just use the kaggle-environments match running tool, although less recommended but still usable and probably doesn't break. Information on using that will be released later.

Desperationis commented 3 years ago

Alright, got it

StoneT2000 commented 3 years ago

@Desperationis Can you test our latest dockerized version of the lux-ai-2021 tool? there's instructions here: https://github.com/Lux-AI-Challenge/Lux-Design-2021#cli-docker

It matches the lux-ai-2021 tool 1-1 (also please use the compile.sh tool in the new C++ starter kit and pass in main.out as the bot files)

So copy over the cli.sh file and simply run

sh cli.sh src/main.out src/main.out and you should be good to go.

Desperationis commented 3 years ago

@StoneT2000 I tried your docker image and I wasn't able to have the container read neither the executable or source file. Keep in mind I have very little docker experience. The script worked fine, though it only worked with bash cli.sh and not sh cli.sh on my system; Not sure why. You might also want to add the sudo prefix on the command tbh. Lmk what I should I do about this; The error is here:

diego@adhoc:~/Desktop/Lux-Design-2021-master/kits/cpp/simple$ sudo bash cli.sh src/main.out src/main.out 

-=-=-=-=-=-=-=-=-=-=-=-| [INFO] match_VKyPzeUzVHhO |-=-=-=-=-=-=-=-=-=-=-=-

[INFO] (match_VKyPzeUzVHhO) - Design: lux_ai_2021 | Initializing match - ID: VKyPzeUzVHhO, Name: match_VKyPzeUzVHhO
AgentFileError: src/main.out does not exist, check if file path provided is correct
    at AgentFileError.AgentError [as constructor] (/usr/local/nvm/versions/node/v14.16.0/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/DimensionError/AgentError/index.js:31:28)
    at new AgentFileError (/usr/local/nvm/versions/node/v14.16.0/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/DimensionError/AgentError/index.js:124:28)
    at new Agent (/usr/local/nvm/versions/node/v14.16.0/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/Agent/index.js:176:23)
    at /usr/local/nvm/versions/node/v14.16.0/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/Agent/index.js:1000:29
    at Array.forEach (<anonymous>)
    at Function.Agent.generateAgents (/usr/local/nvm/versions/node/v14.16.0/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/Agent/index.js:995:19)
    at Match.<anonymous> (/usr/local/nvm/versions/node/v14.16.0/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/Match/index.js:221:53)
    at step (/usr/local/nvm/versions/node/v14.16.0/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/Match/index.js:33:23)
    at Object.next (/usr/local/nvm/versions/node/v14.16.0/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/Match/index.js:14:53)
    at fulfilled (/usr/local/nvm/versions/node/v14.16.0/lib/node_modules/@lux-ai/2021-challenge/node_modules/dimensions-ai/lib/main/Match/index.js:5:58) {
  agentID: 0
}
StoneT2000 commented 3 years ago

Yes you need bash, I'll update the readme later.

So you are saying you still can't run a game when using bash cli.sh?

Desperationis commented 3 years ago

Yeah, but hey, at least it's not the same error. All I know is that the docker pseudo-vm cannot find the executable or cpp file in my local machine.compile.sh works completely fine with no errors, so I think it's cli.sh parameters that are causing me issue. No matter what I throw at it, I get the same "no such file or directory" error.

Screenshot from 2021-08-09 21-28-46

Probably due to the bind mount not working correctly, or just me not using the script correctly. Here, main.out is a legitimate file created by compile.sh.

djkeyes commented 3 years ago

Chiming in here that I am able to reproduce this issue. Here are some results that I found:

bash cli.sh ./kits/cpp/simple/main.cpp ./kits/cpp/simple/main.cpp This command reproduces the issue about 50% of the time. pastebin of the output. The other 50% of the time, it runs successfully.

bash cli.sh ./kits/cpp/simple/main.cpp ./kits/python/simple/main.py This command always works for me (cpp starter + python starter).

bash cli.sh ./kits/cpp/simple/main.out ./kits/cpp/simple/main.out If main.out exists, this command always works for me (precompiled cpp starter + precompiled cpp starter).

This is a stab in the dark, but could there be an issue with two compilation processes trying to write to the same file? Maybe agent 0 is trying to execute main.out, but agent 1 is trying to write to main.out. I'm not exactly sure how the game engine runs under the hood.

Desperationis commented 3 years ago

@djkeyes That is very, very peculiar. Personally, I never encountered the ETXTBSY error again once I used the docker container, only the ENOENT error mentioned above but me and @StoneT2000 were able to resolve that. Maybe check sudo docker ps to see if you're running two instances of the docker image? Keep in mind I only run with two executable files, not cpp files.

I think you're right though on the write-conflict. The command sees both files as individual files and tries to compile a main.out for each, then load it into memory. Because of this, if you try to compile two .cpp files at the same time, the program tries to make two main.out's at the same time (possibly) through multithreading, One might finish earlier before the other even begins, and have time to load it into memory before the other starts a new main.out file. What also might be happening is that the file library used to read each file may not close the file in time, also leading to different writing times. This explains the 50/50 random odds of you being able to compile.

This also explain the py-cpp and out-out combinations. py-cpp only needs to transpile a single cpp file, while the out-out combination can read it directly.

If this is indeed is what is happening, a possible solution for this is to simply name each transpiled executable either 1 or 2 depending on where it is in the parameters. This way, no confict occurs.

StoneT2000 commented 3 years ago

@djkeyes That is very, very peculiar. Personally, I never encountered the ETXTBSY error again once I used the docker container, only the ENOENT error mentioned above but me and @StoneT2000 were able to resolve that. Maybe check sudo docker ps to see if you're running two instances of the docker image? Keep in mind I only run with two executable files, not cpp files.

I think you're right though on the write-conflict. The command sees both files as individual files and tries to compile a main.out for each, then load it into memory. Because of this, if you try to compile two .cpp files at the same time, the program tries to make two main.out's at the same time (possibly) through multithreading, One might finish earlier before the other even begins, and have time to load it into memory before the other starts a new main.out file. What also might be happening is that the file library used to read each file may not close the file in time, also leading to different writing times. This explains the 50/50 random odds of you being able to compile.

This also explain the py-cpp and out-out combinations. py-cpp only needs to transpile a single cpp file, while the out-out combination can read it directly.

If this is indeed is what is happening, a possible solution for this is to simply name each transpiled executable either 1 or 2 depending on where it is in the parameters. This way, no confict occurs.

OH you two may be right! I never knew this could be an issue.

StoneT2000 commented 3 years ago

Chiming in here that I am able to reproduce this issue. Here are some results that I found:

bash cli.sh ./kits/cpp/simple/main.cpp ./kits/cpp/simple/main.cpp This command reproduces the issue about 50% of the time. pastebin of the output. The other 50% of the time, it runs successfully.

bash cli.sh ./kits/cpp/simple/main.cpp ./kits/python/simple/main.py This command always works for me (cpp starter + python starter).

bash cli.sh ./kits/cpp/simple/main.out ./kits/cpp/simple/main.out If main.out exists, this command always works for me (precompiled cpp starter + precompiled cpp starter).

This is a stab in the dark, but could there be an issue with two compilation processes trying to write to the same file? Maybe agent 0 is trying to execute main.out, but agent 1 is trying to write to main.out. I'm not exactly sure how the game engine runs under the hood.

I believe you are exactly right! I think we will change the documentation to instead tell users to pass in main.out as the file instead of main.cpp.

Our CLI tool's backing engine has the problem here: https://github.com/StoneT2000/Dimensions/blob/master/src/MatchEngine/index.ts#L107

For each agent we initialize, if the file given is .cpp, we run a default C++ compilation command (in hindsight, not very smart, should let the user decide on how to compile it better, maybe add a option to submit a compile.sh file (barring security problems with that)). However, in the line above, each agent is asynchronously compiled lmao, so two processes are probably trying to write at the same time or something.

StoneT2000 commented 3 years ago

I'll leave this issue open in case others encounter this and i've added (ETXTBSY error) to the title so its more discoverable, thanks!