Embeddings generation without Trainer

facebookresearch / dpr-scale

Scalable training for dense retrieval models.

262 stars 25 forks source link

Embeddings generation without Trainer #6

Closed roddar92 closed 1 year ago

roddar92 commented 1 year ago

Dear colleagues,

Do you know how to generate embeddings for all contexts without Trainer usage? At initialization step, I only want to load a model from my checkpoint and then calculate query embeddings and retrieval documents as inference step.

Thanks, Daria

roddar92 commented 1 year ago

Well, when I call a PyTorch-like code, this way doesn't working :(

self.task.eval()
with torch.no_grad():
    query_repr = self.task(query)

Do you know another opportunity?

ccsasuke commented 1 year ago

Using our provided script is probably the easiest way to load a model and generate query & document embeddings. If you'd really like to do it without relying on the Pytorch Lightning framework, you'll need to manually set up the datamodule (which converts your queries into tensor formats expected by the model). After that, you should be able to call the forward() function on the model to get embeddings.