Option to Use GPU, CUDA

hieuhthh commented 9 months ago

I really appreciate this repository. I hope the rerank model can optionally use a GPU to fully utilize the performance increase, potentially even with multi-GPU support.

Thank you.

PrithivirajDamodaran commented 9 months ago

Thanks for raising this, we have this in our list.

prashantg445 commented 7 months ago

Hey @PrithivirajDamodaran, Can you publish this list of next action items somewhere, so that people interested in contribution can get started.

P.S.: I am interested to contribute.

PrithivirajDamodaran commented 6 months ago

Thanks for reaching out, @prashantg445

@prabhkaran is working on a few optimisations. He will share those.

Besides that we are going to work on extending FlashRank to support listwise rerankers. Today we are supporting pointwise / pairwise rerankers which frames reranking as a classification task. Given a query q and a passage p pointwise reranker produces a real score indicating the relevance of the passage to the query. The model is optimized using cross entropy or the contrastive loss based on binary relevance judgments from human annotators. At inference time, given the top-k passages returned by the 1st-stage retriever are passed and scored independently. The final passages are then ranked by decreasing the magnitude of their corresponding relevance scores. Instead listwise rerankers consider all the candidate passages.

YVMVN commented 3 months ago

Thanks for reaching out, @prashantg445

@prabhkaran is working on a few optimisations. He will share those.

Besides that we are going to work on extending FlashRank to support listwise rerankers. Today we are supporting pointwise / pairwise rerankers which frames reranking as a classification task. Given a query q and a passage p pointwise reranker produces a real score indicating the relevance of the passage to the query. The model is optimized using cross entropy or the contrastive loss based on binary relevance judgments from human annotators. At inference time, given the top-k passages returned by the 1st-stage retriever are passed and scored independently. The final passages are then ranked by decreasing the magnitude of their corresponding relevance scores. Instead listwise rerankers consider all the candidate passages.

Good day! I really appreciate this repo. However, listwise is too slow on CPUs with llama-cpp. Is there any update on GPU support?

PrithivirajDamodaran / FlashRank

Option to Use GPU, CUDA #8