D-Star-AI / dsRAG

High-performance retrieval engine for unstructured data
MIT License
852 stars 61 forks source link

rank to value formula #14

Closed xiyang-aads-lilly closed 4 months ago

xiyang-aads-lilly commented 4 months ago

rank to value conversion is defined below, I wonder what is the reason or ref behind this equation? what is benefit of this method over RRF?

def convert_rank_to_value(rank: int, irrelevant_chunk_penalty: float, decay_rate: int = 20):
    """
    The irrelevant_chunk_penalty term has the effect of controlling how large of segments are created:
    - 0.05 gives very long segments of 20-50 chunks
    - 0.1 gives long segments of 10-20 chunks
    - 0.2 gives medium segments of 4-10 chunks
    - 0.3 gives short segments of 2-6 chunks
    - 0.4 gives very short segments of 1-3 chunks
    """
    return np.exp(-rank / decay_rate) - irrelevant_chunk_penalty
zmccormick7 commented 4 months ago

The goal is to define chunk value in such a way that segment value can be defined as just the sum of the chunk values, as that makes the optimization (i.e. the search for the best segments) easy. This exponential decay function is the simplest way I could think of to achieve that goal.

xiyang-aads-lilly commented 4 months ago

Thanks for the reply.