ericproffitt / TopicModelsVB.jl

A Julia package for variational Bayesian topic modeling.
Other
81 stars 8 forks source link

Can't find corpus file #1

Closed funnell closed 7 years ago

funnell commented 7 years ago

Trying to load the citeu corpus like so

corp = readcorp(:citeu)

Results in this error:

ERROR: SystemError: opening file /home/ubuntu/.julia/v0.5/topicmodelsvb/datasets/citeu/citeudocs.txt: No such file or directory
 in #systemerror#51 at ./error.jl:34 [inlined]
 in systemerror(::String, ::Bool) at ./error.jl:34
 in open(::String, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at ./iostream.jl:89
 in #readcorp#17(::String, ::String, ::String, ::String, ::Char, ::Bool, ::Bool, ::Bool, ::Bool, ::TopicModelsVB.#readcorp) at /home/ubuntu/.julia/v0.5/TopicModelsVB/src/Corpus.jl:121
 in (::TopicModelsVB.#kw##readcorp)(::Array{Any,1}, ::TopicModelsVB.#readcorp) at ./<missing>:0
 in readcorp(::Symbol) at /home/ubuntu/.julia/v0.5/TopicModelsVB/src/Corpus.jl:436

Which seems to be caused by a capitalization inconsistency

ubuntu@tyler ~/p/ctm ❯❯❯ ls /home/ubuntu/.julia/v0.5/topicmodelsvb/datasets/citeu/citeudocs.txt                                                              ⏎
ls: cannot access '/home/ubuntu/.julia/v0.5/topicmodelsvb/datasets/citeu/citeudocs.txt': No such file or directory
ubuntu@tyler ~/p/ctm ❯❯❯ ls /home/ubuntu/.julia/v0.5/TopicModelsVB/datasets/citeu/citeudocs.txt                                                              ⏎
/home/ubuntu/.julia/v0.5/TopicModelsVB/datasets/citeu/citeudocs.txt

I also tried loading the mac corpus with a similar result.

ericproffitt commented 7 years ago

Hi Tyler, thanks for taking the time to post this issue. This was one of the things I was worried about since I don't have access to other operating systems, I was always a bit worried that my pre-packaged dataset shortcuts might not work for Windows and Linux.

Hopefully this is just a case of my filesystem being case-insensitive, while yours is case-sensitive. But just to make sure this is what's going on. Can you go into the file

/home/ubuntu/.julia/v0.5/TopicModelsVB/src/Corpus.jl

And then scroll to the very bottom of this file and you should see code for the shortcuts. Try changing topicmodelsvb to TopicModelsVB in all the paths, and see if that fixes the problem.

If so then I'll go ahead and push the appropriate changes to the master.

funnell commented 7 years ago

That seemed to work. I also noticed a stopwords file path that might need to be changed as well.

ericproffitt commented 7 years ago

ah yes! Forgot about that one, thanks.