classner / pymp

Easy, OpenMP style multiprocessing for Python on Unix.
MIT License
278 stars 29 forks source link

Parallel processing freezes after completion #7

Closed zoj613 closed 5 years ago

zoj613 commented 5 years ago

Hi, this is a great software that I am trying to incorporate into my own project to run multiple MCMC chains in parallel. However I am facing a problem with the program just freezing after processing, especially when I want to store the mcmc chains after processing. This is the class method I implemented to do the work: `
def run(self, iters=1000, burnin=None, new_init=None, progressbar=True):

    results = pymp.shared.list()
    with pymp.Parallel(*pymp.config.num_threads) as p:
        for chain in p.range(self.n_chains):
            init = self.inits[chain]
            self.model.run_sampler(iters, burnin, init, progressbar=progressbar)
            results.append((p.thread_num, self.model._traces))
    for i in range(self.n_chains):
        setattr(Sampler, 'chain{}'.format(i), results[i])
        self.fullchain = np.concatenate((self.fullchain, results[i]))`

The method runs and completes according to the output shown by the progress bar but as soon as it reaches 100% it freezes and hangs forever until I just press ctrl+c on to cancel. Do you have any idea what might be causing this and any solution or workaround? I am using this on Python 3.6.6 and my OS is Linux (Manjaro)

classner commented 5 years ago

Hi!

I'd be happy to help if I can reproduce the problem. Can you provide a minimal breaking example?

The last part of the method should not be related to the problem, right (at least it doesn't concern pymp at all). Why are you using the asterisk operator when providing the number of threads? I think that's just wrong. Does the problem also occur if you provide the parameter if_=False to the Parallel region constructor?

classner commented 5 years ago

Hi @zoj613,

I would love to clean up the issues for this project. Are there any news on this? If I can reproduce the problem, I'd like to help. If there's no response, I'm going to close this issue in the next days.

Thanks!

pipe980819 commented 5 years ago

Hi @classner,

I am facing the same issue. I realize that the problem is not that the program freezing but takes a long time to execute. This only happens when pymp.shared.list() is added.

This is my code:

columns = ['id', 'title', 'content'] allFiles = glob.glob("*.csv") frame = pd.DataFrame() list = pymp.shared.list() dictionary = {} with pymp.Parallel(4) as p: for file in p.iterate(allFiles): df = pd.readcsv(file,indexcol=None, header=0, usecols=columns) df['content'] = df['content'].str.replace(',', '') df['content'] = df['content'].str.replace('.', '') df['content'] = df['content'].str.replace(';', '') df['content'] = df['content'].str.replace(':', '') df['content'] = df['content'].str.replace('?', '') df['content'] = df['content'].str.replace('!', '') df['content'] = df['content'].str.replace('"', '') list.append(df) frame = pd.concat(list_)

If i do not add that line when my code finishes I get an error because list_ is empty.

classner commented 5 years ago

Hi @pipe980819 ,

thanks for reporting! I can well imagine that this is actually a legit example of just a case where parallelization will not help much (IO is the bottleneck and making it non-sequential can slow down things). Can you provide a minimal breaking example to make this reproducible (can be dummy data) so that I can verify this?

kalyanjk commented 3 years ago

I think its the lists which takes lot of time. What is the correct way to do it ?

size=100000

ex_array = pymp.shared.array((size,))

ex_array = pymp.shared.list([None]*size)

starttime = time.time() with pymp.Parallel(8) as p: for index in p.range(0, size): ex_array[index] = index index ex_array[index] = ex_array[index] index ex_array[index] = ex_array[index] * index

The parallel print function takes care of asynchronous output.

print("Thread time array",time.time() - starttime)

ex_array = pymp.shared.list([None]size) starttime = time.time() with pymp.Parallel(8) as p: for index in p.range(0, size): ex_array[index] = index index ex_array[index] = ex_array[index] index ex_array[index] = ex_array[index] index

The parallel print function takes care of asynchronous output.

print("Thread time list",time.time() - starttime)

import numpy as np

ex_array1 = np.empty(size)

ex_array1 = list([None]*size00)

starttime = time.time() for index in range(0, size): ex_array1[index] = index index ex_array1[index] = ex_array1[index] index ex_array1[index] = ex_array1[index] * index

print("Non thread time array",time.time() - starttime)

ex_array1 = list([None]size) starttime = time.time() for index in range(0, size): ex_array1[index] = index index ex_array1[index] = ex_array1[index] index ex_array1[index] = ex_array1[index] index

print("Non thread time list",time.time() - starttime)

output: Thread time array 0.08570504188537598 Thread time list 25.828571319580078 Non thread time array 0.10504555702209473 Non thread time list 0.03801274299621582

classner commented 3 years ago

Hi @kalyanjk!

Definitely with the array. The list has additional overhead (I think) because all list accesses need to go through a mutex, so there's an incredible amount of write conflicts coming up. This is the worst possible example for using PyMP because the task within the loop is so lightweight. If you want to properly reap benefits, put a computationally heavy task into the loop. You can simulate that easily by just putting a 'sleep' command in there - that should get you the full parallelized speedup. The other learning here really is to prefer arrays over lists in the shared setting where possible.