freesoft / detox_bot2

University of Illinois@Urbana-Champaign MCS-DS CS498 Cloud Computing Applications project
GNU General Public License v3.0
2 stars 0 forks source link

[High] Research if we can use Spark( + MLlib) and replace current scikit-learn based toxic chat classifier. Where to run it(AWS, GCP, etc?) #10

Closed kevinmackie closed 5 years ago

kevinmackie commented 5 years ago

We can feed the chat stream to spark with its streaming API and use the Multinomial Naive Bayes classifier built into MLlib.

freesoft commented 5 years ago

IIRC @noya has been looking into the TensorFlow if we can use it instead of Scikit-learn that current code uses. So we either go with Tensorflow or Spark + Mllib. Whichever easier and convenient to use is preferred to me. ( and also whatever service(aws,gcp,etc?) we are going to use support it well if any )

Also FYI https://spark.apache.org/docs/2.3.0/ml-features.html Looks like it has all the stuff we need. Wondering what Tensorflow has and it's scalable enough compare to Spark.

kevinmackie commented 5 years ago

Spark + MLlib looks pretty easy to use but I'm not familiar with tensorflow so looking forward to hearing from Cindy. Maybe tensorflow is also better if we consider more cutting edge classification approaches (deeper contextual analysis)?

freesoft commented 5 years ago

Yeah, I think it's nice to have to use deep learning stuff but that would increase a lot of dev time for whoever going to work on it. Well.. if we are not staying with current scikit-learn we are going to spend some time to change our impl to Spark or TensorFlow anyway....

Let's wait a bit until we get some feedback from @noya (she already has Tensorflow experience from her work at the office ), and decide what to do.

Worst case we can still just use an existing implementation. Just deploy the result to k8s is still better than the current version.

freesoft commented 5 years ago

This one has addressed. Going with GCP and also will replace existing scikit-learn classifier to Spark + MLlib by @noya .