georgian-io-archive / foreshadow

An automatic machine learning system
https://foreshadow.readthedocs.io
Apache License 2.0
29 stars 2 forks source link

Text transformation (Please review this one LAST) #210

Closed jzhang-gp closed 4 years ago

jzhang-gp commented 4 years ago

Description

jzhang-gp commented 4 years ago

The latest change includes a TruncatedSVD step in the text transformation pipeline. The tricky part is to decide how many components to use. As for now, the value is set to:

As for performance, I tested on the 20news_group data described here: https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html Using Foreshadow running only 1 minute, it has a higher classification accuracy than the NB but lower than the SVM. If there is interest to do more test, I can let TPOT run longer. Let me know your thoughts.