bab2min / tomotopy

Python package of Tomoto, the Topic Modeling Tool
https://bab2min.github.io/tomotopy
MIT License
557 stars 62 forks source link

Segmentation fault (core dumped) about PLDA #166

Closed daidaiduoduo closed 2 years ago

daidaiduoduo commented 2 years ago

Hello~~ I'm trying to run PLDA. I found that when I set the value of latent_topics bigger than 1 (like 5), and then call .make_doc() function. It posts "Segmentation fault (core dumped)" bug. When I set latent_topics==0 or 1, it's ok. I think it is similar to issues 30 : https://github.com/bab2min/tomotopy/issues/30

bab2min commented 2 years ago

Hi @daidaiduoduo, Could you share the code reproducing your error? It will be helpful to find a cause of the bug. Thank you in advance!

daidaiduoduo commented 2 years ago

@bab2min Hello~ Thank you for your reply. I run the following, and it posts the bug

import sys
import tomotopy as tp
import numpy as np
import os
mdl = tp.PLDAModel(tw=tp.TermWeight.ONE, latent_topics=5)    
d = ['a','b']
l = ['c','d']
mdl.add_doc(d,l)
mdl.make_doc(d,l)
bab2min commented 2 years ago

@daidaiduoduo Thank you for sharing the code. I'll examine it!

daidaiduoduo commented 2 years ago

@daidaiduoduo Thank you for sharing the code. I'll examine it!

Hello~ Have you checked? Emm...so, is it the way I use the model?

bab2min commented 2 years ago

Hi @daidaiduoduo Sorry for too late reply. Obviously your code should work without any problems, but it has been found that there is a bug in tomotopy's make_doc() implementation. You can avoid this bug by calling train() before make_doc() as like:

mdl = tp.PLDAModel(tw=tp.TermWeight.ONE, latent_topics=5)    
d = ['a','b']
l = ['c','d']
mdl.add_doc(d,l)
mdl.train(0) # the model is prepared at this point
mdl.make_doc(d,l)
daidaiduoduo commented 2 years ago

Hi @daidaiduoduo Sorry for too late reply. Obviously your code should work without any problems, but it has been found that there is a bug in tomotopy's make_doc() implementation. You can avoid this bug by calling train() before make_doc() as like:

mdl = tp.PLDAModel(tw=tp.TermWeight.ONE, latent_topics=5)    
d = ['a','b']
l = ['c','d']
mdl.add_doc(d,l)
mdl.train(0) # the model is prepared at this point
mdl.make_doc(d,l)

@bab2min Thank you! It works!! 555