dice-group / Palmetto

Palmetto is a quality measuring tool for topics
GNU Affero General Public License v3.0
213 stars 36 forks source link

No segments* file Found Error #87

Closed rudra0713 closed 1 year ago

rudra0713 commented 1 year ago

Hi, I am trying to use Palmetto with another GitHub project (https://github.com/aghie/lam). I want to use it as a Java program, so I downloaded the palmetto-0.1.0-jar-with-dependencies.jar file and the Wikipedia_bd.zip file. After extraction, I got the wikipedia_bd directory and wikipedia_histogram file. Then, I tried to follow the commands as specified in https://github.com/aghie/lam, and I got the following error:

 'org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.SimpleFSDirectory@/Users/rudra/PycharmProjects/lam-master/eval/Wikipedia_bd/wikipedia_bd lockFactory

Can you kindly help me solve this?

MichaelRoeder commented 1 year ago

I am not aware of the Latent Argument Model project. However, the error message is pretty clear. Palmetto expects the wikipedia_bd directory at

/Users/rudra/PycharmProjects/lam-master/eval/Wikipedia_bd/wikipedia_bd

This doesn't seem to be the case.

Apart from that, I would like to point out that the version used in the Latent Argument Model is pretty old. I would suggest to use the latest version of Palmetto.

rudra0713 commented 1 year ago

Hi @MichaelRoeder, thanks for your reply. The wikipedia_bd directory is actually located in the directory you just mentioned, which is, /Users/rudra/PycharmProjects/lam-master/eval/Wikipedia_bd/wikipedia_bd

Given the error description, I was wondering whether some segment files are missing in the downloaded file.

Also, I am a new user of Palmetto. Can I replace palmetto-0.1.0-jar-with-dependencies.jar with palmetto-0.1.5-exec.jar? Also, do I have to download the Wikipedia_bd.zip again or that remains unchanged with the version update?

MichaelRoeder commented 1 year ago

So your file structure looks as follows (in form of a tree, starting at the root direcory /)?

/
\-Users
  \-rudra
    \-PycharmProjects
      \-lam-master
        \-eval
          \-Wikipedia_bd/
            +-wikipedia_bd
            | +-_3f5.cfe
            | +-_3f5.cfs
            | +-_3f5.si
            | ...
            |
            \-wikipedia_bd.histogram

Your wikipedia_bd directory should contain 110 files. All together, they should have a size of ~6.5GB.

Yes, the file should be compatible. No, you do not have to download the zip file again.

rudra0713 commented 1 year ago

Yes, I have the same hierarchy you mentioned. However, instead of 110 files, I only have 31 files. When I tried to extract the Wikipedia_bd.zip file, I get the message "There was a problem while reading the contents of the file Wikipedia_bd.zip. The archive file is incomplete."

I downloaded the zip file from "https://github.com/dice-group/Palmetto/wiki/How-Palmetto-can-be-used". (section: As Java Program). Is that the correct path?

MichaelRoeder commented 1 year ago

Well, in that case, I would assume that your download was not successful, which explains the issue you have :wink:

Yes, the page points to the right file. If you open the following link https://hobbitdata.informatik.uni-leipzig.de/homes/mroeder/palmetto/ you will see that the zip file should have 5GB. Please try again to download the file and let me know whether it worked or not.

rudra0713 commented 1 year ago

Thanks, downloading from the link your specified solved the issue.