cemoody / lda2vec

MIT License
3.15k stars 629 forks source link

the result of preprocess #71

Open MelvinZang opened 6 years ago

MelvinZang commented 6 years ago

When I run preprocess.py in twenty_newsgroup, I get results like these

2 --> SKIP 4 , --> ÉÏ 5 . --> ÉÏ 13 - --> ÉÏ 15 ) --> ÉÏ 16 " --> ÉÏ 17 ( --> ÉÏ 19 : --> ÉÏ 24 ? --> ÉÏ 36 ' --> ÉÏ 43 / --> ÉÏ 49 ! --> ÉÏ 51 ; --> ÉÏ 61 < --> ÉÏ 76 ... --> §£.§£. 79 -- --> -4 90 ] --> ÉÏ 100 max>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax --> MalavikaJagannathan?_mjaganna@greenbaypressgazette.com 108 [ --> ÉÏ 126 | --> ÉÏ 226 } --> ÉÏ 231 10 --> -0

I don't know what should I do to fix it,or it is the right results.

MelvinZang commented 6 years ago

I use the results to run lda2vec_run.py

First I get this result:

Top words in topic 0 x11 sci.crypt pixels copyright pixel meg siggraph moncton phil rpm Top words in topic 1 muslims steam christians communist filter playoffs terrorists indians filters macintosh Top words in topic 2 nuclear revolver housing mike galley cabin ulf sf braking argic Top words in topic 3 rbi reno ss canada bath apartment housing martin obey lindros Top words in topic 4 mph pitchers hitter modems cubs braking telescope velocity blues brakes Top words in topic 5 login dept militia customers 105 bombing abortion minorities workers americans Top words in topic 6 ill tip puck jersey updates tips reply offensive archives guard Top words in topic 7 sponsored rating inherently mode modes recommend voted p.o. p.m. participated Top words in topic 8 0.333 manager logo subscribe stats secretary dec detector archives saves Top words in topic 9 olwm dec gentiles azerbaijanis homosexuals liberal gays corps libertarians armenians Top words in topic 10 firearm revolver knife atrocities bullock suicide accidents snow flyers handgun Top words in topic 11 patents vs coverage v xv patent deals due warranty industry Top words in topic 12 los nowhere shift distinguish gulf direction movement massacre channel slaughter Top words in topic 13 microsoft msdos macintosh injection startup ken unix chinese cell pilot Top words in topic 14 homicides madison dec iraq murders msdos wolverine refugees archive obfuscated Top words in topic 15 edit login writers nejm moderator comics expressed msg copyright conclusions Top words in topic 16 whalers syndrome gods note gotten orbiter subscription rf march cds Top words in topic 17 chi subscribe noise digest ears section horn iron flow criteria Top words in topic 18 pm p.m. p.o. ss deletion microsoft edm verse powerpc disable Top words in topic 19 became ran rose grew stood pulled relations jumped fell remained Traceback (most recent call last): File "examples/twenty_newsgroups/lda2vec/lda2vec_run.py", line 107, in optimizer.zero_grads() AttributeError: 'Adam' object has no attribute 'zero_grads'

It is because my chainer version is 3.5.0 and the attribute 'zero_grads' is in the version below 2.0.0. The I change it to optimizer.use_cleargrads() (I'm not sure it is right or not). And then I get this

J:00561 E:00000 L:nan P:nan R:2.184e+04 J:00562 E:00000 L:nan P:nan R:1.826e+04 J:00563 E:00000 L:nan P:nan R:1.489e+04 Traceback (most recent call last): File "examples/twenty_newsgroups/lda2vec/lda2vec_run.py", line 94, in words) File "/media/data/users/master/2018/zangmingzhe/lda2vec/lda2vec/topics.py", line 76, in prepare_topics assert np.allclose(np.sum(topic_to_word, axis=1), 1), msg AssertionError: Not all rows in topic_to_word sum to 1

Does anybody know where the problem is?

ali3assi commented 6 years ago

@MelvinZang the problem is due to the chainer version. change otpimizer.use_cleargrads.

MelvinZang commented 6 years ago

@TamouzeAssi thanks, it is right.

anupamme commented 6 years ago

@MelvinZang I am also running into this assert error:

Traceback (most recent call last): File "examples/hacker_news/lda2vec/lda2vec_run.py", line 87, in words) File "build/bdist.linux-x86_64/egg/lda2vec/topics.py", line 76, in prepare_topics AssertionError: Not all rows in topic_to_word sum to 1

I also had to switch to use use_cleargrads() instead of zero_grads() due to chainer version.

Were you able to fix the assert error: _AssertionError: Not all rows in topic_toword sum to 1

ali3assi commented 6 years ago

@anupamme replace use_cleargrads() with model.cleargrads()

lovedatatiff commented 6 years ago

@MelvinZang Hi! I'm wondering how long did it take you to run preprocess.py and run.py? Thanks!

MelvinZang commented 6 years ago

@lovedatatiff It takes me nearly a whole day to run preprocess.py. But it takes only a few hours to run lda2vec_run.py with GPU

MelvinZang commented 6 years ago

@anupamme another simple way is to change chainer version to 1.9.0

stalhaa commented 5 years ago

hello @MelvinZang When I run lda2vec.py on my dataset, I get results like these

;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õž&»ÖLI»Íóõ»Yø<ª Œà‚<ß9;|•È»ÀšÓ»lë:œSs;ã@ºIæÖº“u»× ¹±Ÿ°»ã$1:˜V"Œ.[£»dc Œj-!Œ¥KŒK<€òîºdqý; 8x».蔻Ua,»p†ºm»†Ç;þeÀ;¶¡E»ý¶E¹(*׻ۇ›º‘ŠÊ:k±ŒÏ £;H÷’º[Ä;bbž:x:

plz tell,wats happening wrong here? i m stucked.. my dataset contains abstract.txt file (research papers abstracts data)

ranjeetkgupta commented 5 years ago

I am also getting same error AttributeError: 'Adam' object has no attribute 'zero_grads'

has anyone been able to resolve this lately ?

pip show chainer Name: chainer Version: 6.0.0b1 Summary: A flexible framework of neural networks Home-page: https://chainer.org/

Edit : Solved by installing chainer==1.9.0

MelvinZang commented 5 years ago

@stalhaa The results I mentioned in the question are not wrong. It is a conversion process. Words that appear in articles the most are punctuations and the model changes them into something else. When the process continues, it goes normal. You can see plurals turns into singulars and other situation.

I don't understand the results you paste, maybe you can add format so that I can know when and why the model shows things like that.

Hope it helps.

stalhaa commented 5 years ago

plz send me ur email address @MelvinZang .

ranjeetkgupta commented 5 years ago

@MelvinZang . Did you get around the issue by installing chainer version 1.9.0 ? Well, for me it does solve the issue on my mac. But I am trying to setup this on a colab notebook (for gpu support) and unable to install chainer 1.9.0 . *

/tmp/tmpQdaB_J/a.cpp:1:10: fatal error: cudnn.h: No such file or directory

include

          ^~~~~~~~~
compilation terminated.
**************************************************
*** WARNING: Include files not found: ['cudnn.h']
*** WARNING: Skip installing cudnn support
*** WARNING: Check your CPATH environment variable
**************************************************
cython path:/usr/local/lib/python2.7/dist-packages
error: Command '/usr/bin/python2' failed:

  command: /usr/bin/python2 /usr/local/lib/python2.7/dist-packages/cython.py --fast-fail --verbose --cplus --directive profile=False --directive linetrace=False cupy/core/core.pyx
  return code: 1
  output:

Compiling /tmp/pip-build-tkCtKZ/chainer/cupy/core/core.pyx
/usr/local/lib/python2.7/dist-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-build-tkCtKZ/chainer/cupy/core/core.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)

Error compiling Cython file:
------------------------------------------------------------
...
    void* data
    int size
    int shape_and_strides[MAX_NDIM * 2]

cdef class CArray(cupy.cuda.function.CPointer):
                                   ^
------------------------------------------------------------

cupy/core/carray.pxi:14:36: First base of 'CArray' is not an extension type

#################### and with latest version of chainer i get this error. AssertionError: Not all rows in topic_to_word sum to 1)

Really appreciate any insights here !

stalhaa commented 5 years ago

@MelvinZang ??

MelvinZang commented 5 years ago

@stalhaa Sorry, I forgot. 752087739@qq.com

stalhaa commented 5 years ago

@MelvinZang can u please run your lda2vec.py file code by applying my dataset file instead of twenty_newsgroup and share its results later on.? will u plz do it for me ? I want top words from 100 topics. Kindly help me in this regard.thanks.

MelvinZang commented 5 years ago

@stalhaa let me have a try

ghost commented 5 years ago

My problem is that everytime I install Chainer 1.9.0 in place of a later version, my code can't

import cupy.cudnn

and this cause the

UserWarning: cuDNN is not enabled.

But if I don't switch to 1.9.0 and use a latest version, the

AttributeError: 'Adam' object has no attribute 'zero_grads'

happens. If zero_grads is replaced with use_cleargrads(use=False), use_cleargrads(use=True), use_cleargrads(), or model.cleargrads(), any of them,

AssertionError: Not all rows in topic_to_word sum to 1

shows.