UCIDataScienceInitiative / IntroToJulia

A Deep Introduction to Julia for Data Science and Scientific Computing
http://ucidatascienceinitiative.github.io/IntroToJulia/
MIT License
252 stars 87 forks source link

v1.0 issues #37

Open ChrisRackauckas opened 6 years ago

ChrisRackauckas commented 6 years ago

Here's the v1.0 issues that occurred:

oxinabox commented 6 years ago

Update word2vec and dimensionality reduction JLD

Actually, just replace the file loading with Embeddings.jl.

ChrisRackauckas commented 6 years ago

It doesn't look like it's a drop-in replacement?



using Embeddings 
embeddings = load_embeddings(Word2Vec)
Embeddings.EmbeddingTable{Array{Float32,2},Array{String,1}}(Float32[0.0673199 0.0529562 … -0.21143 0.0136373; -0.0534466 0.0654598 … -0.0087888 -0.0742876; … ; -0.00733469 0.0108946 … -0.00405157 0.0156112; -0.00514565 -0.0470722 … -0.0341579 0.0396559], ["</s>", "in", "for", "that", "is", "on", "##", "The", "with", "said"  …  "#-###-PA-PARKS", "Lackmeyer", "PERVEZ", "KUNDI", "Budhadeb", "Nautsch", "Antuane", "tricorne", "VISIONPAD", "RAFFAELE"])

all_words = collect(keys(embeddings))
display(all_words)
embeddings_mat = hcat(getindex.([embeddings], all_words)...)
MethodError: no method matching keys(::Embeddings.EmbeddingTable{Array{Float32,2},Array{String,1}})
Closest candidates are:
  keys(!Matched::Core.SimpleVector) at essentials.jl:580
  keys(!Matched::Cmd) at process.jl:837
  keys(!Matched::DataFrames.Index) at /home/chrisrackauckas/.julia/packages/DataFrames/utxEh/src/other/index.jl:66
  ...

Stacktrace:
 [1] top-level scope at In[15]:1
oxinabox commented 6 years ago

It isn't drop in, but it is really close. Those lines are not required as those are the fields of the type returned by load_embeddings

I think though a restricted list of works should be passed in to the vocals param though to keep it from loading hundreds of thousands

ChrisRackauckas commented 6 years ago
all_words = embeddings.vocab
display(all_words)
embeddings_mat = embeddings.embeddings

? I get an out of memory error after that.

ChrisRackauckas commented 6 years ago

Oh that's why it should be restricted.

pfitzseb commented 6 years ago

Traceur works on 1.0 now, btw.