elixir-nx / scholar

Traditional machine learning on top of Nx
Apache License 2.0
409 stars 44 forks source link

Add the Latent Dirichlet Allocation algorithm for topic modelling #196

Closed sutgeorge closed 8 months ago

sutgeorge commented 10 months ago

I would actually like using this for a customer feedback project.

josevalim commented 10 months ago

Pull requests are welcome!

sutgeorge commented 10 months ago

In addition to this, I'm not sure if graph centrality algorithms fit into the goal of the repo (I'm not sure if you consider them to be more like statistical algorithms rather than generic machine learning). Stuff like PageRank or HITS would help (e.g. A PageRank variation called "TextRank" is a relatively well-known graph-based text summarization algorithm, albeit relatively weak compared to state-of-the-art techniques). I will probably implement this as soon as I'll have the time.

I appreciate your openness 💯

josevalim commented 9 months ago

The biggest question is if those algorithms can be implemented well using tensor operations and Nx semantics. We want most of Scholar code to be written inside defn, so it runs both on CPU and GPU, and I am not particularly certain if graph algorithms are a good fit. We also don't support sparse tensors yet, and that can also be problematic when you are interested in computing and storing edge properties.

josevalim commented 8 months ago

I will go ahead and close this for now. We are still open in conversations if someone wants to move these ideas forward. :)