UKPLab / gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Apache License 2.0
315 stars 39 forks source link

Recomended GPU Memory ? Cuda out of memory during query generation. #40

Open ymurong opened 8 months ago

ymurong commented 8 months ago

Hi,

Thank you for the amazing library.

I am trying GPL on another dataset but encounter problems during the query generation. Now I am using google colab V100 with 16GB GPU RAM with the following config. The CUDA would be out of memory after 10 percent of iterations.

How much GPU do you need for your experiment? Do I need to split the corpus into 10 splits to run ?

import gpl

gpl.toolkit.qgen(
    data_path = "xxxxxx",
    output_dir = "xxxxxxxx",
    generator_name_or_path="doc2query/msmarco-french-mt5-base-v1",
    ques_per_passage=1,
    bsz=1,
    qgen_prefix="qgen",
)
RuntimeError: CUDA out of memory during query generation (queries_per_passage: 1, batch_size_generation: 1). Please try smaller `queries_per_passage` and/or `batch_size_generation`.

Thank you for your help !