Hard to reproduce the result mentioned on the paper.

chrisorm / Cascade-Forests

Implementation of https://arxiv.org/pdf/1702.08835.pdf in scikit-learn

4 stars 3 forks source link

Hard to reproduce the result mentioned on the paper. #1

Open bis-carbon opened 7 years ago

bis-carbon commented 7 years ago

Hello there, Thank you for implementing gcForest. Your code is very neat and easy to understand. I run your code on digit and letter data set, and the result I get is not close to what the paper say. For the Digit data set I get accuracy of 95% on average and for the letter data set 93%, Though the accuracy figure on the paper is 98.96% and 97.25% respectively. What do you think caused the difference? besides limited size of data set used for training and testing.

thank you gain.

chrisorm commented 7 years ago

I am not sure, I have found the same issue myself. The paper is relatively sparse on technical details - it could be anything from a misinterpretation of the method by me to a difference in the implementation of random forest or extremely random forests. Differences between say R and sklearn can be non trivial.

I contacted the authors for clarification but they suggested we wait a few months for the official code to be released.

aaCherish commented 7 years ago

Hi, chris

I have read your code about gcForest. You have well implemented the Cascade Forest structure other than Multi-Grained Scanning mentioned in the paper. I think It is also a very important part concerned with the accuray that gcForest achieves. So I just wonder about whether you have implemented the part of Multi-Grained Scanning.

Thank you