Xtra-Computing / thundergbm

ThunderGBM: Fast GBDTs and Random Forests on GPUs
Apache License 2.0
692 stars 87 forks source link

Not thread safe? #33

Open Tripton opened 4 years ago

Tripton commented 4 years ago

Hello,

in my current use case it would be cool if I can use multi threading/multi processing, cause I have a lot of calculations and my GPU could handle it.

However, I get strange results by using threading. Here is a small script to reproduce

from thundergbm import TGBMRegressor
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from multiprocessing.pool import ThreadPool as Pool
import functools
import numpy as np

def main():
    print("Without parallel threads: " + str(calc(2, 1)))
    print("With parallel threads: " + str(calc(2, 2)))

def calc(num_repeats, pool_size):
    x, y = load_boston(return_X_y=True)
    x = np.repeat(x, 1000, axis=0)
    y = np.repeat(y, 1000)

    x = np.asarray([x]*num_repeats)
    y = np.asarray([y]*num_repeats)

    pool = Pool(pool_size)
    func = functools.partial(fit_gbdt, x=x, y=y)
    results = pool.map(func, range(num_repeats))
    return results

def fit_gbdt(idx, x, y):
    clf = TGBMRegressor(verbose=0)
    clf.fit(x[idx], y[idx])
    y_pred = clf.predict(x[idx])
    rmse = (mean_squared_error(y[idx], y_pred)**(1/2))
    return rmse

if __name__ == '__main__':
    main()

Sometimes I get the error:

2019-10-14 10:48:22,422 FATAL [default] Check failed: [error == cudaSuccess]  an illegal memory access was encountered
2019-10-14 10:48:22,426 WARNING [default] Aborting application. Reason: Fatal log at [/thundergbm/include\thundergbm/util/device_lambda.cuh:49]
2019-10-14 10:48:22,434 FATAL [default] Check failed: [error == cudaSuccess]  an illegal memory access was encountered

and sometimes bad results:

Without parallel threads: [0.011103539879039557, 0.011174528160149052]
With parallel threads: [0.04638805412265755, 4.690559078455652]

Multi processing does work, but only if I'm not returning a TGBM instance. Returning the instance would be the best solution but dosn't work at all cause TGBM is not picklable.

I'm using Windows 10 with CUDA 10.

From my experience it's sometimes hard to do multithreading with CUDA (=> tensorflow) but multiprocessing should be okay if the object is pickelable. Maybe it is possible to make tgbm pickelable or find the bug which causes multi threading to crash.

Many thanks!

zeyiwen commented 4 years ago

Thanks. We will look into this problem and get back to you if we have any update.

zeyiwen commented 4 years ago

@Tripton Just some quick update. You may use the exact method for the tree_method option. We are working hard to locate the bug, which appears to be quite challenging due to massive parallelism in ThunderGBM with multithreading outside of it.

Kurt-Liuhf commented 4 years ago

Hi @Tripton, thanks for your report. ThunderGBM is thread-safe now. I have run a dozen times of your code on our server, and there is no error shown. You should reinstall the library and have a try. Thanks.

zeyiwen commented 4 years ago

this issue should be solved now. so we like to close it.

civilinformer commented 4 years ago

I would like to reopen this issue. I made some small modifications to the above code:

from thundergbm import TGBMRegressor   
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from multiprocessing.pool import ThreadPool as Pool
import functools
import numpy as np

def main():
    print("Without parallel threads and 1 gpu: " + str(calc(2, 1, 1)))
    print("With parallel threads and 1 gpu: " + str(calc(2, 2, 1)))
    print("Without parallel threads and 2 gpu: " + str(calc(2, 1, 2)))
    print("With parallel threads and 2 gpu: " + str(calc(2, 2, 2)))

def calc(num_repeats, pool_size, n_gpus):
    x, y = load_boston(return_X_y=True)
    x = np.repeat(x, 1000, axis=0)
    y = np.repeat(y, 1000)

    x = np.asarray([x]*num_repeats)
    y = np.asarray([y]*num_repeats)

    pool = Pool(pool_size)
    func = functools.partial(fit_gbdt, x=x, y=y, n_gpus=n_gpus)
    results = pool.map(func, range(num_repeats))
    return results

def fit_gbdt(idx, x, y, n_gpus):
    clf = TGBMRegressor(verbose=0, n_gpus=n_gpus)
    clf.fit(x[idx], y[idx])
    y_pred = clf.predict(x[idx])
    rmse = (mean_squared_error(y[idx], y_pred)**(1/2))
    return rmse

if __name__ == '__main__':
    main()

Now sometimes I see output like this:

In [2]: %run test_tgbm.py                                                                                                                                                           
Without parallel threads and 1 gpu: [0.011102704273042477, 0.01117481052674395]
With parallel threads and 1 gpu: [0.01117491826081946, 0.011103542490388574]
Without parallel threads and 2 gpu: [0.01239784807141135, 0.012399129722859907]
Segmentation fault (core dumped)
Kurt-Liuhf commented 4 years ago

Hi @civilinformer, thank you for your feedback. Your test results show that there might still be some bugs of thread-safety on ThunderGBM. We will conduct further tests on the ThunderGBM and fix the thread-safety issue in a better way. And we will get back to you if there is any update. Thank you.