amplab / keystone

Simplifying robust end-to-end machine learning on Apache Spark.
http://keystone-ml.org/
Apache License 2.0
470 stars 117 forks source link

TF-IDF? #299

Closed Refefer closed 6 years ago

Refefer commented 6 years ago

I see the TermFrequency transformer and have found some ghosts of IDFCommonSparseFeatures around the web, but it doesn't appear this transform exists with Keystone anymore.

Any approaches folks suggest for tackling TF-IDF transforms?

etrain commented 6 years ago

Here's the implementation of IDFCommonsparseFeatures that you're referring to: https://github.com/tomerk/keystone/blob/ampcamp-6/src/main/scala/nodes/nlp/IDFCommonSparseFeatures.scala

I haven't tried compiling it against the latest version of KeystoneML, but I would expect that it basically works. Feel free to send a PR if you use it and find it useful.

etrain commented 6 years ago

Closing this for now - feel free to reopen if you need help with the above.