adjidieng / ETM

Topic Modeling in Embedding Spaces
MIT License
538 stars 126 forks source link

a bug in test dataset splitting #33

Open nobrowning opened 3 years ago

nobrowning commented 3 years ago

I noticed that there is bug in the preprocessing code for 20ng(scripts/

missing the idx_permute index convert

Littleele commented 1 year ago

in line 91 idx_permute = np.random.permutation(num_docs_tr).astype(int) the idx_permute is num_docs_tr size, so for test set there is no need to add the index convert