jteresco commented 8 years ago

Since at the moment, the database does need any of the information computed during the graph generation phase of the site update process, the database file generation and the graph generation can be done concurrently. And once the DB file is created, it can be transferred and loaded even before graphs are completed. This can be part of an overall effort to improve the concurrency of the site update process. There are many opportunities where multiple threads could take advantage of multiple cores to speed the whole process.

jteresco commented 7 years ago

NMP file writes could be spawned to a thread.

User list file reading might be able to use multiple threads.

jteresco commented 7 years ago

Possibility: datacheck detection and logging spawned to a thread.

jteresco commented 7 years ago

I knew but had forgotten that Python implementations like the one we use (CPython) can't take advantage of multiple cores, so my first attempt (trying to thread the reading of wpt files) is likely to have much less benefit than hoped.

yakra commented 7 years ago

Bummer. I'm Python illiterate -- do other Python implementations do multi-core?

jteresco commented 7 years ago

Something to experiment with: os.fork()

This will create a duplicate process, just like C's fork() function, so all of the in-memory data structures will be duplicated, but... at least the graph generation should then be able to proceed in parallel with the rest of the usual site update process after the wpt/csv data is all read in.

jteresco commented 7 years ago

Something to investigate: https://pythonhosted.org/mpi4py/usrman/index.html

yakra commented 6 years ago

https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing

yakra commented 6 years ago

Is the TravelMapping:threading branch still needed? Its last commit is included in the master branch.

jteresco commented 6 years ago

Branch not needed, and now removed, thanks.

yakra commented 6 years ago

I played around a little bit...

20c20
< import threading
---
> from multiprocessing import Lock, Process, Queue
35c35
<         self.lock = threading.Lock()
---
>         self.lock = Lock()
2077c2077
< all_waypoints_lock = threading.Lock()
---
> all_waypoints_lock = Lock()

(line numbers may no longer be accurate due to @jteresco's more recent revisions to siteupdate.py)

And on a larger scale:

# set up for threaded processing of highway systems
class ReadWptThread(threading.Thread):

    def __init__(self, id, hs_list, lock):
        threading.Thread.__init__(self)
        self.id = id
        self.hs_list = hs_list
        self.lock = lock

    def run(self):
        #print("Starting ReadWptThread " + str(self.id) + " lock is " + str(self.lock))
        while True:
            self.lock.acquire(True)
            #print("Thread " + str(self.id) + " with len(self.hs_list)=" + str(len(self.hs_list)))
            if len(self.hs_list) == 0:
                self.lock.release()
                break
            h = self.hs_list.pop()
            self.lock.release()
            #print("Thread " + str(self.id) + " assigned " + str(h))
            read_wpts_for_highway_system(h)

        #print("Exiting ReadWptThread " + str(self.id))

hs_lock = threading.Lock()
#print("Created lock: " + str(hs_lock))
hs = highway_systems[:]
hs.reverse()
thread_list = []
# create threads
for i in range(num_threads):
    thread_list.append(ReadWptThread(i, hs, hs_lock))

# start threads
for t in thread_list:
    t.start()

# wait for threads
for t in thread_list:
    t.join()

->

# set up for threaded processing of highway systems
def ReadWptThread(id, q):
    print("Starting ReadWptThread " + str(id))
    while not q.empty():
        print("Thread " + str(id) + " with q.qsize=" + str(q.qsize()))
        h = q.get()
        print("Thread " + str(id) + " assigned " + str(h))
        read_wpts_for_highway_system(h)
    #print("Exiting ReadWptThread " + str(id))

q = Queue()
for hs in highway_systems:
    q.put(hs)
thread_list = []

if __name__ == '__main__':
    # create threads
    for i in range(num_threads):
        thread_list.append(Process(target=ReadWptThread, args=(i, q,)))
    # start threads
    for t in thread_list:
        t.start()
    # wait for threads
    for t in thread_list:
        t.join()

The good news:

My system monitor showed all 4 CPUs chugging away at ~60%, compared to the ~25% of the standard siteupdate.py. There was a bit of a speed improvement, with Finding unprocessed wpt files beginning at [259.8] rather than [330.9].

The bad news:

It worked exactly as expected; when all was said and done, there were no waypoints in memory. Full graph has 0 vertices, 0 edges. Expected, because "in-memory data structures will be duplicated"; state is not shared between sub-processes and/or the parent process. Sharing state and shifting data between processes is discouraged per the Programming guidelines, in favor of using queues or pipes for communication between processes. Neither of which I can wrap my head around how to do. I can't figure out how to get pipes to all point at and write to the same object (E.G. error list or all_waypoints), and actual shared state or pools or any of that stuff is beyond the limits of the "fake it till you make it" approach I've taken with Python over the last week or so. I'll have to tear myself away from this and return my focus to where my time & effort produce more concrete results. 8-)

jteresco commented 6 years ago

Of course the best thing to happen would be for a Python version to appear with genuine threading support. There's a ton of information out there about why it hasn't happened (the interested can search on "python gil"). A multiprocessing solution is the way to go given what Python can currently so, as you started to play with. The potential good news for a future implementation on this is that graph generation can be done independently and possibly run truly concurrently with list file processing and DB file generation. But once I got the list file processing times way down the whole process is no longer taking so much time that it really bothers me for now...

wjorda commented 6 years ago

It's worth noting that the GIL does not apply when waiting for file I/O, so using multithreading will actually improve performance in the largely IO-bounded task given to it.

Of course, the better better way to get around this would be to begin moving away from Python , which is fairly poorly suited for the current task for a number of reasons (The GIL, slow by even interpreted language standards, poor memory performance, error-prone type system). I experiemented with rewriting siteupdate.py in Java and was able to get through the waypoint datacheck stage in under 10s, however, work got in the way so I never finished it.

Sticking with Python, however, there are still ways to considerably improve performance without rewriting the entire codebase. The easiest would be to use dicts or sets instead of lists when membership testing is required. Another would be to use a third-party dataframe library (e.g. Pandas/NumPy), which implement parsing using native code and are considerably faster than our pure Python code. A third option would be to use Python's extension functionality and gradually move more intensive functions over to C or C++.

However, these are all separate issues, and I'm afraid I don't have much time to dedicate to tackling any of them. Just thought I'd offer my thoughts.

Thanks, Wes

On Mon, May 21, 2018, 10:15 PM Jim Teresco notifications@github.com wrote:

Of course the best thing to happen would be for a Python version to appear with genuine threading support. There's a ton of information out there about why it hasn't happened (the interested can search on "python gil"). A multiprocessing solution is the way to go given what Python can currently so, as you started to play with. The potential good news for a future implementation on this is that graph generation can be done independently and possibly run truly concurrently with list file processing and DB file generation. But once I got the list file processing times way down the whole process is no longer taking so much time that it really bothers me for now...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TravelMapping/DataProcessing/issues/26#issuecomment-390841046, or mute the thread https://github.com/notifications/unsubscribe-auth/AIkwwYIhj4C1sQ3oUegwe5dJ9zjsQescks5t03SvgaJpZM4JHry1 .

yakra commented 6 years ago

List file processing goes by in a relative blink of an eye. Once that task finishes, we've still got to chug away at graph generation. The time savings would be pretty small, arguably not worth the hassle of coding. What I would love to do is multithread the graph generation (something else I experimented with earlier), similar to what's done with the processing of highway systems. But then, we wind up with either, on the one hand, 4 CPUs working at ~25% instead of 1 CPU working at ~100%, or OTOH, the need to share graph_list between multiple processes. Ugh.

I'll have to Google "python gil" in a bit and see what I find.

--

Multithreading in C++, OTOH, was straightforward enough that even *I* was able to implement it in a couple programs. Admittedly though, it was relatively simple stuff that never wrote to the same objects in memory from multiple processes.

yakra commented 6 years ago

Re @Thing342's comment: Python's interpreted language, and type system, struck me as things that would come with a lot of overhead. I had considered the idea of rewriting siteupdate in C++. Not seriously, mind you, just considered it. :) But 3500+ lines of code in a language I don't understand would be a lot of time and labor, not to mention the learning curve involved in getting the sql file going at the end of it all. And I'm supposedly working on other things... 8-)

jteresco commented 6 years ago

It's strange how it worked out. As still evidenced by the directory name "python-teresco" I intended the Python implementation to be a quick attempt to get something up and running, fully expecting to write it in something more efficient later. But having all that Python in place that works, it's been hard to convince myself to put in time to translate/rewrite.

yakra commented 6 years ago

Re multithreading graph generation, and

the need to share graph_list between multiple processes

I've thought of a (klugey?) workaround:

Replace the graph_list.append(GraphListEntry(foo, bar, sv, se, baz, qux)) lines with something that will write this info to a file on disk, controlled by a more traditional Lock() mechanism.
After the child processes finish, read the attributes of each graph from the file and add them to graph_list from there.

Thoughts?

yakra commented 5 years ago

@Thing342 in https://github.com/TravelMapping/DataProcessing/issues/26#issuecomment-390845737:

I experiemented with rewriting siteupdate.py in Java and was able to get through the waypoint datacheck stage in under 10s

At the time, datacheck was right after NMP processing, before creating the route hash table for list processing. Using C++, I get to this stage in 23.1s. 10s? Wow! :D Aside from processing smaller HighwayData & UserData datasets, this is a testament to your superior abilities as a programmer, better ability of Java to do things quickly, running on a significantly faster machine, or some combination of the three.

yakra commented 5 years ago

C++

(See also https://github.com/yakra/DataProcessing/issues/28) My approach has been to multithread individual tasks that are suited to it.

Reading waypoints for all routes
Writing near-miss point merged wpt files
Processing traveler list files
Augmenting travelers for detected concurrent segments
Computing stats
Creating per-traveler stats logs and augmenting data structure
Creating subgraphs

What I haven't thought about much until relatively recently has been taking a task only suited for single-threaded operation, and spawning it to its own thread, letting it run concurrent with the rest of siteupdate until its info is needed.

NMP [merged] file writes could be spawned to a thread. User list file reading might be able to use multiple threads.

Both are multithreaded.

Possibility: datacheck detection and logging spawned to a thread.

Detection is done during multithreaded wpt file reading. That's an I/O-bound task (in C++ at least) so in a sense datacheck detection takes 0 time.
unmatchedfps.log & datacheck.log both take 0s, even in Python.
As far as matching FPs goes, my first though was to not, because removing matched FPs from the list can break list::iterator validity. But there are ways around this, that don't necessarily have to incur too much performance penalty for single-threaded operation.

One sigfinificant chunk of single-threaded time is when we write the sql file at the very end. This is a candidate for spawning to its own thread while the rest of siteupdate proceeds.

Since at the moment, the database does need any of the information computed during the graph generation phase of the site update process, the database file generation and the graph generation can be done concurrently.

Since the OP was written, the DB now contains graph info (conveniently, at the very end of the file). Easy enough to handle.

Start one thread writing everything except for the graph info.
Generate graphs.
join() the file-writing thread.
Append graph data to the file.

Python

Something to experiment with: os.fork()

...and per my experiments upthread, it may in theory be possible to multi-thread subgraph generation.

so all of the in-memory data structures will be duplicated

Although this could be a showstopper. Just the amount of RAM involved...

state is not shared between sub-processes and/or the parent process

Here's where things get complicated. The parent process needs the number of vertices and edges of each subgraph in order to populate graph_list. Finding this out entails finding all the points & vertices that belong in each subgraph's set. Moving this out of write_subgraphs_tmg and into the parent process leaves us with very little left to multi-thread. An alternative is to set up graph_list without the vertex/edge counts, then after the files are written, iterate through each GraphListEntry and pull the counts from the corresponding file's header.

Just the mammoth amount of RAM likely involved in this is a turn-off, though. I won't consider this a priority when there are better gains to be gained more easily with C++.

jteresco commented 1 year ago

C++ is taking over and the Python site update is inching closer to a retirement. Since this issue focuses mainly on the Python version and @yakra has invested so much time into C++ multithreading, I am declaring this issue safe to close.

TravelMapping / DataProcessing

Opportunities for multithreading in the site update process #26

->

The good news:

The bad news:

C++

Append graph data to the file.

Python