carlofazioli / cardiathena

A project to study strategies in the game of hearts, using distributed computing, AI, and data analytics.
GNU General Public License v3.0
6 stars 1 forks source link

Too Many Jobs on Argo #95

Closed irobert4 closed 4 years ago

irobert4 commented 4 years ago

Description

David recently got a cease and desist email from Argo support staff about too many jobs being run on the cluster. We need to create a more Argo-friendly version of playhearts.

Tasks

My suggestion is that we:

davidjha commented 4 years ago

I created another branch: multiprocess-hearts-#95. I created a multi-process version of play_hearts.py called play_hearts_multi.py. This version runs the games in their own process, and if my understanding on how python's multiprocessing.Process() works, so each game should get their own cpu or at least run in parallel in the background. However, each game runs so quick (the games are almost run in sequential order) that a simple for loop may be more efficient due to less overhead of creating new processes.

Also adjusted start_game.sh to include a for loop.

carlofazioli commented 4 years ago

It could be worthwhile asking the Argo staff how long they think a "long running" job is. University clusters are routinely running simulations or data analytics for faculty member that can take 1 or 10, or maybe 100 hours.

My intuition guides me to a solution where you have a single-threaded, single-process job that runs a loop. This loop plays the game over and over, maybe just inserting into SQL after every game, or alternatively keeping a local buffer of python game results and then inserting them periodically as a chunk. DBs usually have efficient bulk-write operations. The loop upper bound would be like N = 100,000 or even N = 1,000,000.

Multi-threading and multi-processing are powerful techniques, but also not as off-the-shelf as one might want.

Some general guidelines are that

davidjha commented 4 years ago

For long running jobs, this isn't an issue that we are facing. I believe Carlo is correct the type of usage on university clusters can span many hours.

I chose not to have the data inserted into the DB after every game because the server was not able to handle the throughput that this approach had caused. I elected to go the bulk-write route, as it is more efficient and can be done in batches. Although, I simply write the data to files instead of keeping the data in a cache, and then write these files to the db. MySQL is able to load data from csv files faster over insert statements. Also having these files can serve as having some form of redundancy in case the DB decides to go belly up.

Threads are part of 1 process or running program that share data (global variables for instance) and address space (block of memory that the process lives in). Creating and switching between threads is less expensive to do because less data has to be fetched and copied over from memory.

Processes don't share data/address space with each other. They are separate running programs. Creation and switching (requires an OS system call to interrupt) between processes is more taxing, as more data needs to be fetched and copied.

Our intent is to save time with multi-processing. I think threading would complicate things more than necessary for Hearts. Whereas creating multiple instances of hearts to run in parallel is much simpler and since Hearts is light weight enough, this shouldn't be taxing. Although I'm trying a few things to see how much work can get done in less time.

I haven't seen Black Mirror, but I have heard good things!

davidjha commented 4 years ago

Multi-processing isn't working correctly (incorrect implementation). We're going to loop, and as well as try to submit multi play_hearts and have slurm run each of these on a cpu.