dmis-lab / biobert-pytorch

PyTorch Implementation of BioBERT
http://doi.org/10.1093/bioinformatics/btz682
Other
308 stars 107 forks source link

How to get embedding without deleting duplicates? #33

Open WangyuchenCS opened 2 years ago

WangyuchenCS commented 2 years ago

Hi, I wonder How to get embedding without deleting duplicates? as I found that the output .h5 file return a result that did not match the input .txt length, and it dropped duplicates.

mjeensung commented 2 years ago

Hi @WangyuchenCS

Could you try --keep_text_order True when running the script?

WangyuchenCS commented 2 years ago

Thanks a lot , but it cause an error that has not occurred before

image
mjeensung commented 2 years ago

Thanks for reporting the error.

Could you replace line 13--16 as follows?

entity_id = str(i)
entity_name = f[entity_id].attrs['text']
embedding = f[entity_id]['embedding'][:]