Open ChrisRackauckas opened 6 years ago
Update word2vec and dimensionality reduction JLD
Actually, just replace the file loading with Embeddings.jl.
It doesn't look like it's a drop-in replacement?
using Embeddings
embeddings = load_embeddings(Word2Vec)
Embeddings.EmbeddingTable{Array{Float32,2},Array{String,1}}(Float32[0.0673199 0.0529562 … -0.21143 0.0136373; -0.0534466 0.0654598 … -0.0087888 -0.0742876; … ; -0.00733469 0.0108946 … -0.00405157 0.0156112; -0.00514565 -0.0470722 … -0.0341579 0.0396559], ["</s>", "in", "for", "that", "is", "on", "##", "The", "with", "said" … "#-###-PA-PARKS", "Lackmeyer", "PERVEZ", "KUNDI", "Budhadeb", "Nautsch", "Antuane", "tricorne", "VISIONPAD", "RAFFAELE"])
all_words = collect(keys(embeddings))
display(all_words)
embeddings_mat = hcat(getindex.([embeddings], all_words)...)
MethodError: no method matching keys(::Embeddings.EmbeddingTable{Array{Float32,2},Array{String,1}})
Closest candidates are:
keys(!Matched::Core.SimpleVector) at essentials.jl:580
keys(!Matched::Cmd) at process.jl:837
keys(!Matched::DataFrames.Index) at /home/chrisrackauckas/.julia/packages/DataFrames/utxEh/src/other/index.jl:66
...
Stacktrace:
[1] top-level scope at In[15]:1
It isn't drop in, but it is really close.
Those lines are not required as those are the fields of the type returned by load_embeddings
I think though a restricted list of works should be passed in to the vocals param though to keep it from loading hundreds of thousands
all_words = embeddings.vocab
display(all_words)
embeddings_mat = embeddings.embeddings
? I get an out of memory error after that.
Oh that's why it should be restricted.
Traceur works on 1.0 now, btw.
Here's the v1.0 issues that occurred: