castorini / anserini

Anserini is a Lucene toolkit for reproducible information retrieval research
http://anserini.io/
Apache License 2.0
1.03k stars 458 forks source link

Tutorial/Documentation on Java API #1517

Closed bevankoopman closed 3 years ago

bevankoopman commented 3 years ago

First, thanks for all the work on Anserini.

I might have missed it but is there no documentation/tutorial for using Anserini via the Java API? There are lots of examples in Python and there are lots of examples using the command line tools but is there any resource on how to use Anserini programmatically in Java?

Thanks very much.

lintool commented 3 years ago

Check out, for example: https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage.md

Linked off the main README, there are lots of guides for different collections: https://github.com/castorini/anserini#regression-experiments https://github.com/castorini/anserini#replication-guides

bevankoopman commented 3 years ago

Thanks for the quick reply. All those example are great but are using the command line shell scripts. I was actually trying to find an example of actual Java code. So basically I have a Java web app that needs to call Anserini for retrieval via Java API.

In the end I looked through the SearchCollection.java file and extracted out the necessary parts. This is the basic working example I came up with:

package io.anserini.search;

import io.anserini.analysis.DefaultEnglishAnalyzer;
import io.anserini.index.IndexArgs;
import io.anserini.rerank.ScoredDocuments;
import io.anserini.search.query.BagOfWordsQueryGenerator;
import io.anserini.search.query.QueryGenerator;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.MMapDirectory;

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Locale;

public class SearchCollectionAgAsk {

    public static void main(String[] args) throws IOException {

        String queryString = "some query";
        String index = "/path/to/index_dir";

        Path indexPath = Paths.get(index);
        DirectoryReader reader = DirectoryReader.open(MMapDirectory.open(indexPath));

        Query query = new BagOfWordsQueryGenerator().buildQuery(IndexArgs.CONTENTS,
                DefaultEnglishAnalyzer.fromArguments("krovetz", true, null), queryString);

        IndexSearcher searcher = new IndexSearcher(reader);

        TopDocs td = searcher.search(query, 10);
        ScoredDocuments docs = ScoredDocuments.fromTopDocs(td, searcher);

        int rank = 1;
        for (int i = 0; i < docs.documents.length; i++) {
            String docid = docs.documents[i].get(IndexArgs.ID);

            System.out.println(String.format(Locale.US, "%s %d %f",
                    docid, rank, docs.scores[i]));

            rank++;
        }
    }
}

Maybe this snippet is useful for others as a basic Java example for future documentation.

Thanks.

lintool commented 3 years ago

Ah, I see! Glad you found what you're looking for!