Closed kevinmackie closed 5 years ago
IIRC @noya has been looking into the TensorFlow if we can use it instead of Scikit-learn that current code uses. So we either go with Tensorflow or Spark + Mllib. Whichever easier and convenient to use is preferred to me. ( and also whatever service(aws,gcp,etc?) we are going to use support it well if any )
Also FYI https://spark.apache.org/docs/2.3.0/ml-features.html Looks like it has all the stuff we need. Wondering what Tensorflow has and it's scalable enough compare to Spark.
Spark + MLlib looks pretty easy to use but I'm not familiar with tensorflow so looking forward to hearing from Cindy. Maybe tensorflow is also better if we consider more cutting edge classification approaches (deeper contextual analysis)?
Yeah, I think it's nice to have to use deep learning stuff but that would increase a lot of dev time for whoever going to work on it. Well.. if we are not staying with current scikit-learn we are going to spend some time to change our impl to Spark or TensorFlow anyway....
Let's wait a bit until we get some feedback from @noya (she already has Tensorflow experience from her work at the office ), and decide what to do.
Worst case we can still just use an existing implementation. Just deploy the result to k8s is still better than the current version.
This one has addressed. Going with GCP and also will replace existing scikit-learn classifier to Spark + MLlib by @noya .
We can feed the chat stream to spark with its streaming API and use the Multinomial Naive Bayes classifier built into MLlib.