castorini / rank_llm

RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
http://rankllm.ai
Apache License 2.0
365 stars 43 forks source link

Thoughts about API design #110

Open lintool opened 7 months ago

lintool commented 7 months ago

Just thoughts... @ronakice @sahel-sh reactions welcome.

Two methods: rerank and rerank_batch.

Method signature:

    def rerank(self, query: Query, candidates: Candidates, k: int = 10) -> Candidates:

Commentary:

The, rerank_batch would be:

    def rerank_batch(self, query: List[Query], candidates: List[Candidates], k: int = 10) -> Candidates:

rerank can just route to rerank_batch with a list of one object.

sahel-sh commented 7 months ago

This looks clean, below are my thoughts: If we want the rerank to be compatible with many retrievers, we should add creators to Query and Candidates classes to convert str and json/dict to instances of these classes and expect the user to call them before calling rerank.

I think more important that adding meta data to Query and its Candidates separately. It is important to have metadata for a reranking run: PTAL at the https://github.com/castorini/rank_llm/blob/main/src/rank_llm/result.py Where we store each prompt, response, and input output token counts in an execution info. Each rerank for a given query will have a list of this information. This doesn't belong to a query nor its candidate, but an execution summary for query + its candidates.

lintool commented 7 months ago

This looks clean, below are my thoughts: If we want the rerank to be compatible with many retrievers, we should add creators to Query and Candidates classes to convert str and json/dict to instances of these classes and expect the user to call them before calling rerank.

Agreed. There should be a pretty straightforward mapping from json/dict to Python objects. I envision from_json "importers" and to_json "exporters". (We can quibble about method naming later.)

I think more important that adding meta data to Query and its Candidates separately. It is important to have metadata for a reranking run: PTAL at the https://github.com/castorini/rank_llm/blob/main/src/rank_llm/result.py Where we store each prompt, response, and input output token counts in an execution info. Each rerank for a given query will have a list of this information. This doesn't belong to a query nor its candidate, but an execution summary for query + its candidates.

Agreed. An example of metadata attached to a Query might be its id; in the case of TREC topics, different fields (title, narrative, description), etc.

sahel-sh commented 7 months ago

What should be the name of the high-level class containing Query and Candidates, and the metadata? Currently it is Results, but maybe we can find a better alternative. (The batch call then, can take a list of these objects, rather two lists for queries and candidates separately)

On Fri, Apr 19, 2024, 5:16 p.m. Jimmy Lin @.***> wrote:

This looks clean, below are my thoughts: If we want the rerank to be compatible with many retrievers, we should add creators to Query and Candidates classes to convert str and json/dict to instances of these classes and expect the user to call them before calling rerank.

Agreed. There should be a pretty straightforward mapping from json/dict to Python objects. I envision from_json "importers" and to_json "exporters". (We can quibble about method naming later.)

I think more important that adding meta data to Query and its Candidates separately. It is important to have metadata for a reranking run: PTAL at the https://github.com/castorini/rank_llm/blob/main/src/rank_llm/result.py Where we store each prompt, response, and input output token counts in an execution info. Each rerank for a given query will have a list of this information. This doesn't belong to a query nor its candidate, but an execution summary for query + its candidates.

Agreed. An example of metadata attached to a Query might be its id; in the case of TREC topics, different fields (title, narrative, description), etc.

— Reply to this email directly, view it on GitHub https://github.com/castorini/rank_llm/issues/110#issuecomment-2067293366, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNSCSUHK3FEZY5YAKLTV6TY6GCSBAVCNFSM6AAAAABGPWB4V6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXGI4TGMZWGY . You are receiving this because you were mentioned.Message ID: @.***>

sahel-sh commented 7 months ago

This is what we agreed on with @ronakice.

New Query and Candidate classes:

class Query(TypedDict):
    text: str
    qid: str

class Candidate(TypedDict):
    docid: str
    score: str
    title: NotRequired[str]
    content: str

Existing Classes:

class RankingExecInfo:
    def __init__(
        self, prompt, response: str, input_token_count: int, output_token_count: int
    ):
        self.prompt = prompt
        self.response = response
        self.input_token_count = input_token_count
        self.output_token_count = output_token_count

    def __repr__(self):
        return str(self.__dict__)

class Result:
    def __init__(
        self,
        query: **Query**,
        candidates: **List[Candidate]**,
        ranking_exec_summary: List[RankingExecInfo] = None,
    ):
        self.query = query
        self.candidates = candidates
        self.ranking_exec_summary = ranking_exec_summary

    def __repr__(self):
        return str(self.__dict__)

Then batch_batch would look like this:

def rerank_batch(self, results: List[Result], k: int = 10) -> List[Result]:

@lintool PTAL and let me know what you think. The remaining issue is the name Result for the class that contains a query, a list of candidates, and execution summery (after reranking)

lintool commented 7 months ago

Captured discussion over Slack w/ @sahel-sh ... still unresolved points.

This seems uncontroversial:

 def rerank(self, query: Query, candidates: Candidates, k: int = 10) -> Candidates:

Note that Candidates is a container object around something like List[Candidate], plus metadata.

However, what should rerank_batch look like?

The obvious extension would be:

S1: def rerank_batch(self, query: List[Query], candidates: List[Candidates], k: int = 10) -> List[Candidates]:

But @sahel-sh points out: ...not batch friendly: when you always need to zip two lists to iterate through them together (list of queries and list of candidates), it is a strong signal that you should have a list of pairs. I also think the execution info belongs to both query and its candidates not just candidates.

An alternative would be:

S2: def rerank_batch(self, input: Dict[Query, Candidates], k: int = 10) -> Dict[Query, Candidates]:

But at this point, you might as well wrap Dict[Query, Candidates] in a container object, e.g., Results, so we have:

S3: def rerank_batch(self, input: Results, k: int = 10) -> Results:

So, we have S1, S2, S3 as options...

sahel-sh commented 7 months ago

S3 doesn't capture what I had in mind accurately:

def rerank(self, input: Result, k: int=10) ->Result
def rerank_batch(self, input: List[Result], k: int=10) ->List[Result]

where each Result (for lack of a better name) has a Query , List[Candidate], and optional execution metadata.

sahel-sh commented 7 months ago

There are a couple of minor issues with s1 naming as well that we should resolve if we decide to adopt it: Ideally rerank should have singular args, and rerank_batch plural:

lintool commented 7 months ago

I guess I'm hung up over naming of Result... it's really a Tuple[Query, List[Candidates]]:

Let me propose S4 then?

def rerank(self, query: Query, candidates: Candidates, k: int = 10) -> Candidates
def rerank_batch(self, input: List[Tuple[Query, Candidates]], k: int=10) -> List[Tuple[Query, Candidates]]

Then the issue is that we don't have anywhere to stuff the metadata.

sahel-sh commented 7 months ago

I guess I'm hung up over naming of Result... it's really a Tuple[Query, List[Candidates]]:

Let me propose S4 then?

def rerank(self, query: Query, candidates: Candidates, k: int = 10) -> Candidates
def rerank_batch(self, input: List[Tuple[Query, Candidates]], k: int=10) -> List[Tuple[Query, Candidates]]

Then the issue is that we don't have anywhere to stuff the metadata.

I like s4 better than s1. Similar to s1, the metadata can go into Candidates class (not the best fit imho). I think the only advantages that s3 has over s4 are:

I think my preference ranking is s3 > s4 > s1 > s2 (s2 is my least favorite since chunking a list in batch mode to be handled with multiple threads is more natural than chunking a dictionary)

lintool commented 7 months ago

S5: another strategy - use Python's ability to return multiple values:

def rerank(self, query: Query, candidates: List[Candidate], k: int = 10) -> (List[Candidate], ExecutionTraceObject)
def rerank_batch(self, input: List[Tuple[Query, List[Candidate]]], k: int=10) -> (List[Tuple[Query, List[Candidate]]], List[ExecutionTraceObject])

Note, in this design, we don't have Candidates anymore.

Or S5a

def rerank(self, query: Query, candidates: List[Candidate], k: int = 10) -> (List[Candidate], ExecutionTraceObject)
def rerank_batch(self, input: List[Tuple[Query, List[Candidate]]], k: int=10) -> List[Tuple[Query, List[Candidate], ExecutionTraceObject]]

Or S6, making more prominent use of Tuple

def rerank(self, query: Query, candidates: List[Candidate], k: int = 10) -> Tuple[Query, List[Candidate], ExecutionTraceObject]
def rerank_batch(self, input: List[Tuple[Query, List[Candidate]]], k: int=10) -> List[Tuple[Query, List[Candidate], ExecutionTraceObject]]
lintool commented 7 months ago

Okay, after further thoughts, S7:

def rerank(self, query: Query, candidates: List[Candidate], k: int = 10) -> Result
def rerank(self, request: Request, k: int = 10) -> Result
def rerank_batch(self, input: List[Request], k: int=10) -> List[Result]

Commentary:

So I would envision a snippet like:

q = Query('myqid', 'the actual query')
candidates = ...

results = rank_llm.rerank(query=q, documents=candidates, k=10)

print(results.query) # you just get the query back
print(results.execution_trace) # or whatever we want to call it.
for hit in results.candidates:
    # iterate over candidates, these are Candidate objects

And:

request = Request(Query(qid, query), candidates)
# iterate to build up a list of requests
ronakice commented 7 months ago

Having slept throught most of this, I like S7 the most. Polymorphism is perfectly fine. I think we should discuss more in detail about if we want rerank to be calling rerank_batch (vllm introduces some differences) but a discussion not for this thread.

sahel-sh commented 7 months ago

I also like s7. It follows many best practices like separation of the input param type and output param (even if they are very similar now (they might diverge in the future).
My only comment is on the Request as the query, candidates wrapper. I think we should reserve Request and Response for RestAPI input and output. I think we can call the wrapper class something like RankingInput or RankingInstance or something like this. Later on a List[RankingInput] can be included in the RestAPI's request as is. Similarly, the output wrapper name could be RankingResult.

re rerank_batch: I agree that batch can be misleading in the sense that we don't launch a batch job to handle the requests, our calls to the llm backend is still online and blocking, which just have a for loop to iterate over. @ronakice did you have a cl to do an actual batch inference with the llm?

lintool commented 7 months ago

I also like s7. It follows many best practices like separation of the input param type and output param (even if they are very similar now (they might diverge in the future). My only comment is on the Request as the query, candidates wrapper. I think we should reserve Request and Response for RestAPI input and output. I think we can call the wrapper class something like RankingInput or RankingInstance or something like this. Later on a List[RankingInput] can be included in the RestAPI's request as is. Similarly, the output wrapper name could be RankingResult.

Not sure how much I agree with this. At a high level, shouldn't methods take conceptual versions of the objects that abstract away from the implementation? That is, we should have a Request "concept" independent of whether it's implemented via REST, protobufs, or whatever. The method should hide invocation details, object marshaling, etc. behind the scenes? And if we really need, we'll have concrete subclasses like HttpRequest, ProtobufRequest, etc.

re rerank_batch: I agree that batch can be misleading in the sense that we don't launch a batch job to handle the requests, our calls to the llm backend is still online and blocking, which just have a for loop to iterate over. @ronakice did you have a cl to do an actual batch inference with the llm?

I really mean rerank_batch as in "rerank a batch of requests". To me at least, it doesn't specify an implementation... could be a for loop, a distributed scatter/gather call, or map (in MapReduce)?

My $0.02... but happy to discuss further though.

sahel-sh commented 7 months ago

Thanks for explaining, I will implement s7 as is, then

lintool commented 7 months ago

Two more issues to sort out:

  1. JSON format.
  2. Anserini/Pyserini command-line options.

For (2), in Anserini, SearchCollection currently takes -output to generate a standard TREC runfile. So I propose to add a new option-outputRerankerRequests to generate whatever we decide for (1) to feed into RankLLM. I propose to also add option -outputRerankerRequests.format, taking in either json or jsonl. I can see either being useful. In particular, it would make sense to output json pretty-printed (i.e., with proper human readable identation).

In Pyserini, the options would be --output-reranker-requests and --output-reranker-requests-format, following standard Python conventions.

Thoughts? Happy to consider alternate names for the options.

lintool commented 7 months ago

Instructions here! https://github.com/castorini/anserini/blob/master/docs/fatjar-regressions-v0.35.1.md