NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.84k stars 900 forks source link

Loading word2vec embedding exceeds the memory limit #807

Open danielwonght opened 4 years ago

danielwonght commented 4 years ago

Describe the bug

Loading word2vec embedding causes the memory issue. Loading embedding vector in string format require much more memory.

Solution

Modify the function matchzoo.embedding.load_from_file from:

data = pd.read_csv(file_path, sep=" ", index_col=0, header=None, skiprows=1)

to:

data = pd.read_csv(file_path, sep=" ", index_col=0, header=None, skiprows=1, quoting=csv.QUOTE_NONE)

bwanglzu commented 4 years ago

would you like to send a PR to fix this issue? @danielwonght

danielwonght commented 4 years ago

Sure.