Xtra-Computing / thundergbm

ThunderGBM: Fast GBDTs and Random Forests on GPUs
Apache License 2.0
689 stars 84 forks source link

Problems with random forest classifier when using more and deeper leaners #80

Open OswaldHongyu opened 1 year ago

OswaldHongyu commented 1 year ago

Hi, I'm new to thunderGBM,

I just run the example of random forest ''' from thundergbm import TGBMClassifier from sklearn.datasets import load_digits from sklearn.metrics import accuracy_score

x, y = load_digits(return_X_y=True) clf = TGBMClassifier(bagging=1,depth=12, n_trees=1,n_parallel_trees=100) clf.fit(x, y) y_pred = clf.predict(x) accuracy = accuracy_score(y, y_pred) print(accuracy) ''' and several problems have arisen:

First, I watch the verbose and found that, when set "n_trees=1", the classifier only use 1 leaner, no matter how I set the value of "n_parallel_trees", contrary to the claim in issue #42 .

Furthermore, I try more and deeper learners, when "depth" is more than 20, or "n_trees" more than 70, the program may well crash. When I use python file, it turns out to be a Segmentation fault (core dumped), when I use jupyter notebook, the kernel died. When I try a large dataset with millions of samples, it crashed even when converting csr to csc. Cause I'm using a workstation with a CPU of 32 cores, 128 GB memory, and a RTX 3090 GPU, I don't believe this is a hardware issue. Is thunderGBM only capable to train really small forests on small datasets ? That's unacceptable. I'm confused and hope to see the power of thunderGBM.