anhaidgroup / deepmatcher

Python package for performing Entity and Text Matching using Deep Learning.
BSD 3-Clause "New" or "Revised" License
565 stars 129 forks source link

Error in running dm.data.process #57

Open kamakshi-malhotra opened 4 years ago

kamakshi-malhotra commented 4 years ago

I am getting this error while running code->

Code-> train, validation, test = dm.data.process( path='sample_data/itunes-amazon', train='train.csv', validation='validation.csv', test='test.csv')

Error->

Reading and processing data from "sample_data/itunes-amazon/train.csv" 0% [############################# ] 100% | ETA: 00:00:00 Reading and processing data from "sample_data/itunes-amazon/validation.csv" 0% [############################# ] 100% | ETA: 00:00:00 Reading and processing data from "sample_data/itunes-amazon/test.csv" 0% [############################# ] 100% | ETA: 00:00:00

ValueError Traceback (most recent call last)

in () 3 train='train.csv', 4 validation='validation.csv', ----> 5 test='test.csv') 7 frames /usr/local/lib/python3.6/dist-packages/fastText/FastText.py in __init__(self, model) 35 self.f = fasttext.fasttext() 36 if model is not None: ---> 37 self.f.loadModel(model) 38 39 def is_quantized(self): ValueError: /root/.vector_cache/wiki.en.bin has wrong file format!
goldjacob29 commented 4 years ago

I am getting the same issue -- was just about to post this!

sidharthms commented 4 years ago

It appears that fastText file format may have recently changed. For now could you try using an earlier version of fastText (https://pypi.org/project/fasttext/#history) perhaps 0.9.1?

kamakshi-malhotra commented 4 years ago

I tried using fasttext 0.9.1 but I am getting the same error with it. Also on earlier version 0.8.4, there was an error in installing deepmatcher.

XingkaiLiu commented 4 years ago

Maybe just try to reinstall deepmatcher with: pip install git+https://github.com/anhaidgroup/deepmatcher.git Because the PyPi has not yet been updated with the recent modifications.

SLane35 commented 4 years ago

I'm having the same error, and reinstalling deepmatcher didn't help. Was anyone able to solve this?

sidharthms commented 4 years ago

Apparently this is because of failing to download the word embedding model correctly from Google drive. Let me look into this. For now you can get it working by adding these lines before dm.data.process as in this Colab: https://colab.research.google.com/drive/1Qqx4FCj3JKt1oGHslsO3M8BXgyhXLWfp#scrollTo=os5kG_92eMwT

!wget https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.en.zip --directory-prefix=/root/.vector_cache
!unzip /root/.vector_cache/wiki.en.zip -d /root/.vector_cache/
!rm /root/.vector_cache/wiki.en.vec

This will fetch the model zip directly from Facebook AI but is slower and takes more space since it has additional data.

SLane35 commented 4 years ago

Excellent, thanks @sidharthms ! In the meantime my friend got this working by increasing the colab RAM from 12 GB to 25 GB. But this workaround is good to know as well. Thanks again!