Open lintool opened 7 months ago
This looks clean, below are my thoughts: If we want the rerank to be compatible with many retrievers, we should add creators to Query and Candidates classes to convert str and json/dict to instances of these classes and expect the user to call them before calling rerank.
I think more important that adding meta data to Query and its Candidates separately. It is important to have metadata for a reranking run: PTAL at the https://github.com/castorini/rank_llm/blob/main/src/rank_llm/result.py Where we store each prompt, response, and input output token counts in an execution info. Each rerank for a given query will have a list of this information. This doesn't belong to a query nor its candidate, but an execution summary for query + its candidates.
This looks clean, below are my thoughts: If we want the rerank to be compatible with many retrievers, we should add creators to Query and Candidates classes to convert str and json/dict to instances of these classes and expect the user to call them before calling rerank.
Agreed. There should be a pretty straightforward mapping from json/dict to Python objects. I envision from_json
"importers" and to_json
"exporters". (We can quibble about method naming later.)
I think more important that adding meta data to Query and its Candidates separately. It is important to have metadata for a reranking run: PTAL at the https://github.com/castorini/rank_llm/blob/main/src/rank_llm/result.py Where we store each prompt, response, and input output token counts in an execution info. Each rerank for a given query will have a list of this information. This doesn't belong to a query nor its candidate, but an execution summary for query + its candidates.
Agreed. An example of metadata attached to a Query
might be its id; in the case of TREC topics, different fields (title, narrative, description), etc.
What should be the name of the high-level class containing Query and Candidates, and the metadata? Currently it is Results, but maybe we can find a better alternative. (The batch call then, can take a list of these objects, rather two lists for queries and candidates separately)
On Fri, Apr 19, 2024, 5:16 p.m. Jimmy Lin @.***> wrote:
This looks clean, below are my thoughts: If we want the rerank to be compatible with many retrievers, we should add creators to Query and Candidates classes to convert str and json/dict to instances of these classes and expect the user to call them before calling rerank.
Agreed. There should be a pretty straightforward mapping from json/dict to Python objects. I envision from_json "importers" and to_json "exporters". (We can quibble about method naming later.)
I think more important that adding meta data to Query and its Candidates separately. It is important to have metadata for a reranking run: PTAL at the https://github.com/castorini/rank_llm/blob/main/src/rank_llm/result.py Where we store each prompt, response, and input output token counts in an execution info. Each rerank for a given query will have a list of this information. This doesn't belong to a query nor its candidate, but an execution summary for query + its candidates.
Agreed. An example of metadata attached to a Query might be its id; in the case of TREC topics, different fields (title, narrative, description), etc.
— Reply to this email directly, view it on GitHub https://github.com/castorini/rank_llm/issues/110#issuecomment-2067293366, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNSCSUHK3FEZY5YAKLTV6TY6GCSBAVCNFSM6AAAAABGPWB4V6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXGI4TGMZWGY . You are receiving this because you were mentioned.Message ID: @.***>
This is what we agreed on with @ronakice.
New Query and Candidate classes:
class Query(TypedDict):
text: str
qid: str
class Candidate(TypedDict):
docid: str
score: str
title: NotRequired[str]
content: str
Existing Classes:
class RankingExecInfo:
def __init__(
self, prompt, response: str, input_token_count: int, output_token_count: int
):
self.prompt = prompt
self.response = response
self.input_token_count = input_token_count
self.output_token_count = output_token_count
def __repr__(self):
return str(self.__dict__)
class Result:
def __init__(
self,
query: **Query**,
candidates: **List[Candidate]**,
ranking_exec_summary: List[RankingExecInfo] = None,
):
self.query = query
self.candidates = candidates
self.ranking_exec_summary = ranking_exec_summary
def __repr__(self):
return str(self.__dict__)
Then batch_batch would look like this:
def rerank_batch(self, results: List[Result], k: int = 10) -> List[Result]:
@lintool PTAL and let me know what you think. The remaining issue is the name Result
for the class that contains a query, a list of candidates, and execution summery (after reranking)
Captured discussion over Slack w/ @sahel-sh ... still unresolved points.
This seems uncontroversial:
def rerank(self, query: Query, candidates: Candidates, k: int = 10) -> Candidates:
Note that Candidates
is a container object around something like List[Candidate]
, plus metadata.
However, what should rerank_batch
look like?
The obvious extension would be:
S1: def rerank_batch(self, query: List[Query], candidates: List[Candidates], k: int = 10) -> List[Candidates]:
But @sahel-sh points out: ...not batch friendly: when you always need to zip two lists to iterate through them together (list of queries and list of candidates), it is a strong signal that you should have a list of pairs. I also think the execution info belongs to both query and its candidates not just candidates.
An alternative would be:
S2: def rerank_batch(self, input: Dict[Query, Candidates], k: int = 10) -> Dict[Query, Candidates]:
But at this point, you might as well wrap Dict[Query, Candidates]
in a container object, e.g., Results
, so we have:
S3: def rerank_batch(self, input: Results, k: int = 10) -> Results:
So, we have S1, S2, S3 as options...
S3 doesn't capture what I had in mind accurately:
def rerank(self, input: Result, k: int=10) ->Result
def rerank_batch(self, input: List[Result], k: int=10) ->List[Result]
where each Result
(for lack of a better name) has a Query
, List[Candidate]
, and optional execution metadata.
There are a couple of minor issues with s1 naming as well that we should resolve if we decide to adopt it: Ideally rerank should have singular args, and rerank_batch plural:
queries
(vs query
in rerank), but it cannot have candidatess (vs candidates
in rerank) since the arg is already plural in rerank.Candidates
class has a List[Candidate] field. What would be the name of this class field? maybe candidates or candidate_list? Then accessing this field will look like candidates[of type Candidates].candidates or candidates.candidate_list which is not very readable.I guess I'm hung up over naming of Result
... it's really a Tuple[Query, List[Candidates]]
:
Let me propose S4 then?
def rerank(self, query: Query, candidates: Candidates, k: int = 10) -> Candidates
def rerank_batch(self, input: List[Tuple[Query, Candidates]], k: int=10) -> List[Tuple[Query, Candidates]]
Then the issue is that we don't have anywhere to stuff the metadata.
I guess I'm hung up over naming of
Result
... it's really aTuple[Query, List[Candidates]]
:Let me propose S4 then?
def rerank(self, query: Query, candidates: Candidates, k: int = 10) -> Candidates def rerank_batch(self, input: List[Tuple[Query, Candidates]], k: int=10) -> List[Tuple[Query, Candidates]]
Then the issue is that we don't have anywhere to stuff the metadata.
I like s4 better than s1. Similar to s1, the metadata can go into Candidates class (not the best fit imho). I think the only advantages that s3 has over s4 are:
I think my preference ranking is s3 > s4 > s1 > s2 (s2 is my least favorite since chunking a list in batch mode to be handled with multiple threads is more natural than chunking a dictionary)
S5: another strategy - use Python's ability to return multiple values:
def rerank(self, query: Query, candidates: List[Candidate], k: int = 10) -> (List[Candidate], ExecutionTraceObject)
def rerank_batch(self, input: List[Tuple[Query, List[Candidate]]], k: int=10) -> (List[Tuple[Query, List[Candidate]]], List[ExecutionTraceObject])
Note, in this design, we don't have Candidates
anymore.
Or S5a
def rerank(self, query: Query, candidates: List[Candidate], k: int = 10) -> (List[Candidate], ExecutionTraceObject)
def rerank_batch(self, input: List[Tuple[Query, List[Candidate]]], k: int=10) -> List[Tuple[Query, List[Candidate], ExecutionTraceObject]]
Or S6, making more prominent use of Tuple
def rerank(self, query: Query, candidates: List[Candidate], k: int = 10) -> Tuple[Query, List[Candidate], ExecutionTraceObject]
def rerank_batch(self, input: List[Tuple[Query, List[Candidate]]], k: int=10) -> List[Tuple[Query, List[Candidate], ExecutionTraceObject]]
Okay, after further thoughts, S7:
def rerank(self, query: Query, candidates: List[Candidate], k: int = 10) -> Result
def rerank(self, request: Request, k: int = 10) -> Result
def rerank_batch(self, input: List[Request], k: int=10) -> List[Result]
Commentary:
rerank
. Just syntactic sugar.Tuple
provides more flexibility.So I would envision a snippet like:
q = Query('myqid', 'the actual query')
candidates = ...
results = rank_llm.rerank(query=q, documents=candidates, k=10)
print(results.query) # you just get the query back
print(results.execution_trace) # or whatever we want to call it.
for hit in results.candidates:
# iterate over candidates, these are Candidate objects
And:
request = Request(Query(qid, query), candidates)
# iterate to build up a list of requests
Having slept throught most of this, I like S7 the most. Polymorphism is perfectly fine. I think we should discuss more in detail about if we want rerank to be calling rerank_batch (vllm introduces some differences) but a discussion not for this thread.
I also like s7. It follows many best practices like separation of the input param type and output param (even if they are very similar now (they might diverge in the future).
My only comment is on the Request
as the query, candidates wrapper. I think we should reserve Request
and Response
for RestAPI input and output. I think we can call the wrapper class something like RankingInput
or RankingInstance
or something like this.
Later on a List[RankingInput] can be included in the RestAPI's request as is. Similarly, the output wrapper name could be RankingResult
.
re rerank_batch
: I agree that batch can be misleading in the sense that we don't launch a batch job to handle the requests, our calls to the llm backend is still online and blocking, which just have a for loop to iterate over. @ronakice did you have a cl to do an actual batch inference with the llm?
I also like s7. It follows many best practices like separation of the input param type and output param (even if they are very similar now (they might diverge in the future). My only comment is on the
Request
as the query, candidates wrapper. I think we should reserveRequest
andResponse
for RestAPI input and output. I think we can call the wrapper class something likeRankingInput
orRankingInstance
or something like this. Later on a List[RankingInput] can be included in the RestAPI's request as is. Similarly, the output wrapper name could beRankingResult
.
Not sure how much I agree with this. At a high level, shouldn't methods take conceptual versions of the objects that abstract away from the implementation? That is, we should have a Request
"concept" independent of whether it's implemented via REST, protobufs, or whatever. The method should hide invocation details, object marshaling, etc. behind the scenes? And if we really need, we'll have concrete subclasses like HttpRequest
, ProtobufRequest
, etc.
re
rerank_batch
: I agree that batch can be misleading in the sense that we don't launch a batch job to handle the requests, our calls to the llm backend is still online and blocking, which just have a for loop to iterate over. @ronakice did you have a cl to do an actual batch inference with the llm?
I really mean rerank_batch
as in "rerank a batch of requests". To me at least, it doesn't specify an implementation... could be a for loop, a distributed scatter/gather call, or map (in MapReduce)?
My $0.02... but happy to discuss further though.
Thanks for explaining, I will implement s7 as is, then
Two more issues to sort out:
For (2), in Anserini, SearchCollection
currently takes -output
to generate a standard TREC runfile. So I propose to add a new option-outputRerankerRequests
to generate whatever we decide for (1) to feed into RankLLM. I propose to also add option -outputRerankerRequests.format
, taking in either json
or jsonl
. I can see either being useful. In particular, it would make sense to output json
pretty-printed (i.e., with proper human readable identation).
In Pyserini, the options would be --output-reranker-requests
and --output-reranker-requests-format
, following standard Python conventions.
Thoughts? Happy to consider alternate names for the options.
Just thoughts... @ronakice @sahel-sh reactions welcome.
Two methods:
rerank
andrerank_batch
.Method signature:
Commentary:
Query
object as opposed to just astr
. This will allow us to attach additional fields to include metadata, etc.Candidates
object as opposed to just a list of json objects. Again, this will allow us to attach other metadata, such as execution traces, etc.Candidates
is a fresh object. I.e., the method is non-destructive - does not mess with the input.The,
rerank_batch
would be:rerank
can just route torerank_batch
with a list of one object.