deepset-ai / haystack-core-integrations

Additional packages (components, document stores and the likes) to extend the capabilities of Haystack version 2.0 and onwards
https://haystack.deepset.ai
Apache License 2.0
121 stars 119 forks source link

feat: Fastembed - add FastembedRanker #1178

Closed paulmartrencharpro closed 1 week ago

paulmartrencharpro commented 1 week ago

Proposed Changes:

New feature: integrate with the new ranker from fastembed. (https://qdrant.tech/articles/cross-encoder-integration-gsoc/, https://github.com/qdrant/fastembed/tree/main?tab=readme-ov-file#-rerankers)

I based my code on CohereRanker and Fastembed existing code

How did you test it?

I wrote all the tests in test_fastembed_ranker.py, based on the other tests for fastembed.

Checklist

paulmartrencharpro commented 1 week ago

In general, this PR is almost ready to be merged.

TODO:

* modify this line in README to mention ranker: https://github.com/deepset-ai/haystack-core-integrations/blob/c3437763e7c80e11e9ed8678f1b235353a316781/README.md?plain=1#L37

* add more tests if you can/want (uncovered lines: 66-67, 109, 115->exit, 162-163, 166, 170-171, 174-175)

Then, if you also want to open a PR on haystack-integrations to update this page ( https://github.com/deepset-ai/haystack-integrations/blob/main/integrations/fastembed.md), it would be appreciated - otherwise, I'll take care of it.

I updated the Readme and added tests. The only test I cannot write is the one that check that we don't create 2 TextCrossEncoder when we call warm_up multiple times. I don't understand the patch/mockup code to replicate it. I wrote this but it's not working:

@patch('fastembed.rerank.cross_encoder.TextCrossEncoder')
def test_warmup_twice(self, mock_encoder):
    """
    Test for checking that warm up only creates one TextCrossEncoder
    """
    ranker = FastembedRanker(model_name="Xenova/ms-marco-MiniLM-L-12-v2")
    ranker.warm_up()
    ranker.warm_up()

    # Assert that TextCrossEncoder is called only once
    mock_encoder.assert_called_once()

I'll make the PR on the other repo

anakin87 commented 1 week ago

something is failing with merging this PR. I will try to close it and reopen it.