elixir-nx / scholar

Traditional machine learning on top of Nx
Apache License 2.0
433 stars 45 forks source link

Add the Latent Dirichlet Allocation algorithm for topic modelling #196

Closed sutgeorge closed 11 months ago

sutgeorge commented 1 year ago

I would actually like using this for a customer feedback project.

josevalim commented 1 year ago

Pull requests are welcome!

sutgeorge commented 1 year ago

In addition to this, I'm not sure if graph centrality algorithms fit into the goal of the repo (I'm not sure if you consider them to be more like statistical algorithms rather than generic machine learning). Stuff like PageRank or HITS would help (e.g. A PageRank variation called "TextRank" is a relatively well-known graph-based text summarization algorithm, albeit relatively weak compared to state-of-the-art techniques). I will probably implement this as soon as I'll have the time.

I appreciate your openness 💯

josevalim commented 1 year ago

The biggest question is if those algorithms can be implemented well using tensor operations and Nx semantics. We want most of Scholar code to be written inside defn, so it runs both on CPU and GPU, and I am not particularly certain if graph algorithms are a good fit. We also don't support sparse tensors yet, and that can also be problematic when you are interested in computing and storing edge properties.

josevalim commented 11 months ago

I will go ahead and close this for now. We are still open in conversations if someone wants to move these ideas forward. :)