Closed danlivingston closed 6 months ago
Doesn't the hop at n+1 depend on the hop at n? Multiprocessing may be helpful to check k runs at a time at each hop instead of 1 by 1.
I've looked through the code and had this idea, which should theoretically allow to multiprocess trade runs without rewriting the code too much. I'm not a python guy, but I can try implementing this, and @eyeonus may I run the idea by you first, just in case it has some glaring holes I didn't notice?
Idea: Split the processing on the second (maybe in the future third etc, but baby steps) hop. Among all possible second hop destinations, workers only process predetermined subset (just skipping the processing for any systems not allocated for them).
Conditions:
Implementation outline:
this-worker-number
, worker-count
parameter to run
subcommand, for the next item on this listtradedb.TradeDB.getDestinations()
seems to be the place to do this), only select those assigned to the current worker. The getDestinations()
result seems to be consistent given the same parameters, so each worker can calculate it independently, and for N workers we just split the list into N chunks, with C'th worker getting C'th chunk. (the last chunk will be a bit bigger than others but no dig deal). Unfortunately, unless there's some cool python feature I don't know, this means getDestinations()
would have to pre-calculate the whole list, at least for the second hopFinal notes:
getDestinations()
, but I wanted this to work with passing the least amount of data between the orchestrator and the workers and requiring the least amount of modifications to existing algorithms. The other thing is that the extra work will only happen on the second hop, all the further hops will be working off the reduced set of second hops and as such shouldn't be duplicating.--start-jumps
, since each worker will also have to independently calculate the first hop (which is trivial when you have only a few origins), but, well, this can probably be adopted to make the "splitter hop" selectableWhat do you people think?
The code that does the trade runs is a recursive algorithm, so splitting at top level isn't theoretically hard. The problem you may run into is that it almost always splits into more than two at the top level. Keep in mind a program can only have as many processes as the computer has processors, so there is a hardware limit to the number of processes a multi-process algorithm can utilize.
The algorithm splits for every system within range of the origin that passes all the requirements for consideration, which can be as little as 0 out past the bubble, to very very high numbers in densely populated areas.
The database is only ever written to when using the import or update sub commands. Calculating a run only needs to read the database, so that condition is kind of a given.
Feel free to work on this if you want. It's open source for a reason, so go ahead and make yourself a fork, and once you have a working build, send me a PR.
So, I just added #79 which could help with that. Docker is great for splitting things off into workers if that's needed.
Turning TD into a self-contained VM would not help at all with this.
I don't code in Python, but I know of a couple or three of muti-threaded Monte Carlo simulation implementations for Python out there. Plugging them in should not be a hard, albeit not trivial either, task for someone who do code in Python.
General monte Carlo simulations can be done in parallel since current evaluations do not depend on prior values. TD, however, is creating a path, and the “value” (profit) at the end node depends on all prior visited stations, so it’s a serial calculation instead of parallel. One may be able to calculate all child nodes from current in parallel, perhaps, but not all paths from origin.
Avi
On Sun, Jan 3, 2021 at 7:10 AM Chronos Phaenon Eosphoros < notifications@github.com> wrote:
I don't code in Python, but I know of a couple or three of muti-threaded Monte Carlo simulation implementations for Python out there. Plugging them in should not be a hard, albeit not trivial either, task for someone who do code in Python.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eyeonus/Trade-Dangerous/issues/75#issuecomment-753607636, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWJ4OE46S37YWL3Y7EMX3SYBND7ANCNFSM4O2PQC3A .
-- Sent from Gmail Mobile
Even if it can be difficult, mathematically speaking every recursive function has an iterative version. In most common cases, it has to deal with a queue of "things" (fifo, lifo, that depends on the recursive fonction) that gets filled along the way.
Still speaking in computing theory, this queue is to be consumed by N processes (number of processors, 2x number of processors, etc), which in turn put their results in the queue. If I understand correctly, the multiple paths are going to be constructed in waves, one hop for each path at a time.
The mechanics are very common, this is "just" browsing a graph, level after level.
Another approach is : given you already have the algorithm to construct one or more paths (limited by jumps/hops ; not yet computing traded goods at this time, only valid path), we could also feed these multiple paths into the queue, each one beiing given to a consumer process that will evaluate "the trading benefits" from start to end, associate the benefit with the path, discard the path if not matching a thresehold, and storing the path+benefit somewhere (an array ? whatever). Multiple path are thus evaluated in parallel. when queue is empty, master process sorts the array by benefit, and displays N firsts to the user. (just a theorical exemple)
the "one or more feeder / queue / one or more consumers" is a very robust pattern in multiprocessing programming, and in python (all the semantics and tools are already included since long enough I forgot it ; been coding python since 2006).
I once coded a web crawler / analyzer with multiple consumers on multiple computers and configurable patterns in consumers. I will drop an eye on TD as soon as I find the time (i'm not in the computer business any more, but growing an educational farm…). Not this week because there are a lot of kids 'til end of holidays.
I have been privately working on multithreading (not clean enough for a PR yet) and have accomplished significant progress (ie I believe that the multithreading is 90% complete) but have encountered one major hurdle: sqlite does not support multithreading/multiprocessing yet that is required to complete the innermost getBestHop loop
I'm a bit stuck on this so any advice would be greatly appreciated. Where I'm trying to head right now is proxying the requests through a custom multiprocessing Manager object or queue such that the main process can respond.
@bahaa32 Two general approaches.
1) As you said, run every database query through a single object that will pass the queries on to the database one at a time, creating a choke point and probably killing a lot of the benefit of the entire idea 2) The memory expensive approach - throw sqlite out the window for the actual route search. Before you split to threads, just select * all the relevant tables into collections in memory, and search those instead. Memory expensive, but gaming PCs tend towards having some to spare.
Going nowhere and slowly at that. Making much more progress on optimizations of what we have.
Enable multiprocessing for calculations to run faster on multi-core CPUs. Thought of this when running for trade routes with 500 LY jump range to see if FC trading is profitable.
Implementing this in python is an absolute PITA so understandable if
wontfix
or delayed indefinitely.