bab2min / tomotopy

Python package of Tomoto, the Topic Modeling Tool
https://bab2min.github.io/tomotopy
MIT License
560 stars 63 forks source link

Inference of paths for unseen documents using HLDA #91

Closed masastat closed 3 years ago

masastat commented 3 years ago

Hi, thank you for a great library! I am using HLDA model. I would like to get a path for an unseen (not used for training) document doc .

doc = model.make_doc(text)
model.infer(doc)
print(doc.path)

However, 'doc.path' for an unseen document seems to be wrong (though 'doc.path' for a document used in training seems to be correct).

doc.path for unseen document contains multiple same values. (for example, array([ 0, 0, 0, 0, 216], dtype=int32))

How can I get the correct path? Thanks in advance!

bab2min commented 3 years ago

Hi @masastat , thank you for reporting. The problem you reported was considered a bug and is currently being fixed. It will be patched in the next minor update, so please wait a bit. Thank you!

bab2min commented 3 years ago

@masastat It has been fixed since v0.10.1. Also I added a new sample about HLDA (https://github.com/bab2min/tomotopy/blob/main/examples/hlda_basic.py). I hope it helps. Thank you again for reporting!