andrewtavis / wikirec

Recommendation engine framework based on Wikipedia data
BSD 3-Clause "New" or "Revised" License
18 stars 10 forks source link

Add t-SNE to wikirec #35

Open andrewtavis opened 3 years ago

andrewtavis commented 3 years ago

It would be helpful to be able to visualize the embeddings created by wikirec models, and one such way to achieve this is t-SNE. This would allow the results models to be visually compared to see how relationships are being derived.

The Python package kwx has an implementation of t-SNE that could be adopted for this package, with another reference being the blogpost that this package was originally based on, which is found here. Ideally this would be put into a visuals.py module, which further would be added to the documentation and tested using pytest's monkeypatch feature (see the tests for kwx for an example). Partial implementations are more than welcome though!

Please first indicate your interest in working on this, as it is a feature implementation :)

Thanks for your interest in contributing!

andrewtavis commented 3 years ago

@victle, will write to you about this here :) You'd be welcome to work on this as well if you're interested! I honestly think that the kwx implementation of TFIDF would be good enough to copy over, but then you'd be welcome to look into other resources and make improvements (I'd then use them in kwx no doubt).

A question would be how the package structure would change. In kwx I have kwx.visuals, so would we want to do a wikirec.visuals module? And if so, should graph_lda_topic_evals be moved into there from wikirec.utils?

victle commented 3 years ago

I'll try to go through that implementation and get a better understanding first. Any resources you have besides kwx would be appreciated! I like the idea of having a wikirec.visuals module—keeps everything nice and tidy.

andrewtavis commented 3 years ago

Sounds great :) I just updated the wiki just now with some useful resources on t-SNE 😊

Feel free to split off the module yourself in the PR. Thanks for this!