Hard coded queries for nearest neighbor

adjidieng / ETM

Topic Modeling in Embedding Spaces

MIT License

549 stars 128 forks source link

Hard coded queries for nearest neighbor #14

Open ajw-42 opened 4 years ago

ajw-42 commented 4 years ago

Hello,

Thanks for an interesting paper and for sharing the code.

I've been trying this method on some non-English datasets and a small stumbling block is the hard coded queries for nearest neighbor in main.py (lines 209 and 374). This might be a problem for some English datasets as well.

I've fixed this in my own experiments by querying ten random words from the vocab, but I thought I'd flag it here in case you want to address this. I'd be happy to submit a PR myself if that's alright.

yilunzhao commented 4 years ago

Yes, thanks for your kind remind!

I am now applying ETM on Chinese dataset, but it seems there still exists some bugs in my code, and the result is not as amazing as what we achieved on 20NewsGroup provided.

I wonder whether ETM on your non-English datasets experiment works well? Thanks!

mona-timmermann commented 4 years ago

Is it simply hard-coded cosine distance?