HazyResearch / tabi

Code release for Type-Aware Bi-Encoders for Open-Domain Entity Retrieval
Apache License 2.0
18 stars 3 forks source link

Standard retrieval mode has preblems with reshaping #2

Open SergeyPetrakov opened 2 years ago

SergeyPetrakov commented 2 years ago

(tabi) petrakov@nlp2:~/tabi$ python3 scripts/demo.py --model_checkpoint best_model.pth --entity_emb_path embs.npy --entity_file entity.pkl 2022-09-20 12:13:09,194 [INFO] Loading model... 2022-09-20 12:13:09,194 [INFO] Using encoder model: bert-base-uncased Traceback (most recent call last): File "scripts/demo.py", line 67, in model = Biencoder( File "/home/petrakov/tabi/tabi/models/biencoder.py", line 47, in init entity_embs = np.memmap(entity_emb_path, dtype="float32", mode="r").reshape( ValueError: cannot reshape array of size 1950552064 into shape (768)

May be you can help me with it

mleszczy commented 2 years ago

Hi Sergey! Thanks for raising the issue! Which model checkpoint and embedding path are you using? Are these ones you trained or one of the four listed on the README?

SergeyPetrakov commented 2 years ago

Hi! I followed your instructions written in README. My step were the following (as stated in your README): 1) Made new conda environment with python=3.7 2) clone repo 3) pip install -r requirements.txt 4) pip install -e . 5) pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/torch_stable.html 6) download best_model.pth, embs.npy, entity.jsonl, entity.pkl 7) tried to launch TAbi interactively using python scripts/demo.py --model_checkpoint best_model.pth --entity_emb_path embs.npy --entity_file entity.pkl After that I received above mentioned error

mleszczy commented 2 years ago

Hi Sergey, apologies for the slow reply. Can you check the md5 of the embedding file? These are the md5s for each of the embedding files (embs.npy):

If the md5 doesn't match one of the above, you may need to re-download the file. Please comment if that still doesn't work, thanks!

SergeyPetrakov commented 2 years ago

Hi Megan! No problem, thank you for md5 that you sent. I checked it, now it works as described in your repo. Btw, do I correctly understand that your model focuses only on input written in English? Can it be finetuned on the multilingual datasets to retrieve entities from other languages?