castorini / anserini

Anserini is a Lucene toolkit for reproducible information retrieval research
http://anserini.io/
Apache License 2.0
1.01k stars 433 forks source link

Discussion: REST API routes for different corpus/model combinations - how do we name? #2502

Open lintool opened 2 months ago

lintool commented 2 months ago

We have different corpus/model combinations. For example, we have msmarco-v1-passage, msmarco-v1-passage.splade-pp-ed, and msmarco-v1-passage.cos-dpr-distil. They are all on MS MARCO V1 Passage, but different models - the first for BM25, SPLADE, and cosDPR.

For querying different corpus/model combinations, how do we want to configure the REST API route?

Options I see are:

(1) corpus in route, model as query parameter, e.g.,

(2) corpus/model (=index) combination directly in route, e.g.,

The first feels conceptually cleaner, but requires an extra layer of indirection to map into our final indexes.

If we go with option (2), then "collection" is misnamed, but we can change to "index"?

16BitNarwhal commented 2 months ago

I initially named the route to match the PrebuiltIndexHandler since all the indexes were from there so in that case option (2) changed to "index" would be an improvement (I'm not sure why I named it "collection" at the start). I also agree that (1) is cleaner, but I'm not sure if it works for BEIR (and correct me if I'm wrong) since I don't think .flat is a model

ronakice commented 2 months ago

Using the first option, you can maintain clarity and separation between the corpus and the model, making it easier to add or change models in the future. It also simplifies API usage, as users will understand that the primary resource is the collection, and the model is a configurable parameter. So I side with it, but agreed too that it is a bit of work.

lintool commented 2 months ago

I also agree that (1) is cleaner, but I'm not sure if it works for BEIR (and correct me if I'm wrong) since I don't think .flat is a model

Yea, you'll need another layer of indirection.

I'll make an executive decision of going with (2) for now, with perhaps moving to (1) further down the road... it's probably too much work right now.

@16BitNarwhal can you rename /api/collection/ to /api/index/ and let's wrap this up for now?

16BitNarwhal commented 1 month ago

https://restfulapi.net/resource-naming/ https://stackoverflow.blog/2020/03/02/best-practices-for-rest-api-design/ Based on the above resources, here are some considerations for renaming the api routes:

lintool commented 1 month ago

Adding /api/v1/... is a breaking change, but I like the idea. Let's do v1.0 instead of just v1.

I'm okay with all the other aspects of your proposal. Let's do it!

16BitNarwhal commented 1 month ago

Sounds good, will draft and work on #2525

lintool commented 1 month ago

BTW, I don't think we need to preserve the old API, so a breaking change here is fine.