Closed DavisTownsend closed 1 year ago
https://github.com/dask/dask-ml/issues/5 May be of interest.
yeah I've seen that, didn't seem like a final answer was shown there. I'td be nice to have it natively supported though so I can make my business process depend on it without worrying too much about supporting it myself going forward
would be state of the art and very useful if Dask could natively handle distributed TF IDF matrices as input to a multinomial naive bayes model. I know this is a difficult problem to solve because for most implementations of computing TF IDF you need the entire Term Document matrix in memory so I'm not sure know how to solve this problem tbh. Problem referenced here: https://stackoverflow.com/questions/25145552/tfidf-for-large-dataset