Open Tripton opened 5 years ago
Thanks. We will look into this problem and get back to you if we have any update.
@Tripton Just some quick update. You may use the exact
method for the tree_method
option. We are working hard to locate the bug, which appears to be quite challenging due to massive parallelism in ThunderGBM with multithreading outside of it.
Hi @Tripton, thanks for your report. ThunderGBM is thread-safe now. I have run a dozen times of your code on our server, and there is no error shown. You should reinstall the library and have a try. Thanks.
this issue should be solved now. so we like to close it.
I would like to reopen this issue. I made some small modifications to the above code:
from thundergbm import TGBMRegressor
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from multiprocessing.pool import ThreadPool as Pool
import functools
import numpy as np
def main():
print("Without parallel threads and 1 gpu: " + str(calc(2, 1, 1)))
print("With parallel threads and 1 gpu: " + str(calc(2, 2, 1)))
print("Without parallel threads and 2 gpu: " + str(calc(2, 1, 2)))
print("With parallel threads and 2 gpu: " + str(calc(2, 2, 2)))
def calc(num_repeats, pool_size, n_gpus):
x, y = load_boston(return_X_y=True)
x = np.repeat(x, 1000, axis=0)
y = np.repeat(y, 1000)
x = np.asarray([x]*num_repeats)
y = np.asarray([y]*num_repeats)
pool = Pool(pool_size)
func = functools.partial(fit_gbdt, x=x, y=y, n_gpus=n_gpus)
results = pool.map(func, range(num_repeats))
return results
def fit_gbdt(idx, x, y, n_gpus):
clf = TGBMRegressor(verbose=0, n_gpus=n_gpus)
clf.fit(x[idx], y[idx])
y_pred = clf.predict(x[idx])
rmse = (mean_squared_error(y[idx], y_pred)**(1/2))
return rmse
if __name__ == '__main__':
main()
Now sometimes I see output like this:
In [2]: %run test_tgbm.py
Without parallel threads and 1 gpu: [0.011102704273042477, 0.01117481052674395]
With parallel threads and 1 gpu: [0.01117491826081946, 0.011103542490388574]
Without parallel threads and 2 gpu: [0.01239784807141135, 0.012399129722859907]
Segmentation fault (core dumped)
Hi @civilinformer, thank you for your feedback. Your test results show that there might still be some bugs of thread-safety on ThunderGBM. We will conduct further tests on the ThunderGBM and fix the thread-safety issue in a better way. And we will get back to you if there is any update. Thank you.
Hello,
in my current use case it would be cool if I can use multi threading/multi processing, cause I have a lot of calculations and my GPU could handle it.
However, I get strange results by using threading. Here is a small script to reproduce
Sometimes I get the error:
and sometimes bad results:
Multi processing does work, but only if I'm not returning a TGBM instance. Returning the instance would be the best solution but dosn't work at all cause TGBM is not picklable.
I'm using Windows 10 with CUDA 10.
From my experience it's sometimes hard to do multithreading with CUDA (=> tensorflow) but multiprocessing should be okay if the object is pickelable. Maybe it is possible to make tgbm pickelable or find the bug which causes multi threading to crash.
Many thanks!