Closed poulain-tim closed 4 years ago
hi @Oulaolay - welcome!
To be clear, you'd want a variant of get_postings_list
that takes an already analyzed term, right?
There's actually already an outstanding issue: https://github.com/castorini/anserini/issues/990
I'm not sure when we'll get to it... but you're welcome to send a pull request...
haha, got to it!
Thanks to all these modification ! I try to create a new branch for participating to this project, but it seems i don't have the right to make pull requests. Can you grant me this right ?
The errors that i found are in pyclass.py :
JEnglishStemmingAnalyzer = autoclass('io.anserini.analysis.EnglishStemmingAnalyzerr')
will become
JEnglishStemmingAnalyzer = autoclass('io.anserini.analysis.DefaultEnglishAnalyzer')
and i have an error in this function :
"JTokenizeOnlyAnalyzer = autoclass('io.anserini.analysis.TokenizeOnlyAnalyzer')"
File "jnius/jnius_export_func.pxi", line 28, in jnius.find_javaclass jnius.JavaException: Class not found b'io/anserini/analysis/TokenizeOnlyAnalyzer'
This function isn't present in anserini-0.7.3-fatjar.jar
Thanks !
Best Regards !
Hi @Oulaolay,
The errors are because of a recent change in Anserini. Pyserini needs to be changed accordingly. I already submitted a PR for this. In order to make a PR you can fork the repository and push to the fork. Then you can create a PR with your fork.
It's perfect ! I'll know next time though.
Have a good day @Chriskamphuis
Hi @lintool ! I have a new issue : I created a new index with the dataset "DUC-2001" by mean of this function :
I also installed Luke Toolbox project to understand how the index working.
When i run this code :
it works for some terms but not for all...
I think there are two different indexes, the first one applies a stemming ( the word "Cherokee" become "cheroke") and the second keeps the word without stemming.
So, how can i stemming the posting index ?
Best regards