Closed jzhang-gp closed 4 years ago
The latest change includes a TruncatedSVD step in the text transformation pipeline. The tricky part is to decide how many components to use. As for now, the value is set to:
As for performance, I tested on the 20news_group data described here: https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html Using Foreshadow running only 1 minute, it has a higher classification accuracy than the NB but lower than the SVM. If there is interest to do more test, I can let TPOT run longer. Let me know your thoughts.
Description