bert_model.embed generates different vectors for same inputs

rssdev10 commented 5 years ago

Hello, I'm trying to use yours BERT implementation for strings vectorizing. But I found that bert_embedding = sample |> bert_model.embed generates different vectors per each call. Is it wrong usage from my side?

I'm just trying to use it in the manner of this package - https://github.com/JuliaText/Embeddings.jl but with BERT's specific embedding.

using Transformers
using Transformers.Basic
using Transformers.Pretrain
using Transformers.Datasets
using Transformers.BidirectionalEncoder

using Flux
using Flux: onehotbatch, gradient
import Flux.Optimise: update!
using WordTokenizers

ENV["DATADEPS_ALWAYS_ACCEPT"] = true
const FromScratch = false

#use wordpiece and tokenizer from pretrain
const wordpiece = pretrain"bert-uncased_L-12_H-768_A-12:wordpiece"
const tokenizer = pretrain"bert-uncased_L-12_H-768_A-12:tokenizer"
const vocab = Vocabulary(wordpiece)

#see model.jl
const bert_model = gpu(
  FromScratch ? create_bert() : pretrain"bert-uncased_L-12_H-768_A-12:bert_model"
)

function vectorize(str::String)
  tokens = str |> tokenizer |> wordpiece
  text = ["[CLS]"; tokens; "[SEP]"]
  token_indices = vocab(text)
  segment_indices = [fill(1, length(tokens) + 2);]
  sample = (tok = token_indices, segment = segment_indices)
  bert_embedding = sample |> bert_model.embed
  collect(sum(bert_embedding, dims=2)[:])
end

using Distances
x1 = vectorize("Some test text")
x2 = vectorize("Some test text")
cosine_dist(x1, x2) # !!!! >>0

using LinearAlgebra
LinearAlgebra.norm(x1 .- x2)  # !!!! >>0

chengchingwen commented 5 years ago

hi, I guess the reason is that I set the dropout default to activate. Run Flux.testmode!(bert_model) to deactivate the dropout layer then it should always give the same embedding.

rssdev10 commented 5 years ago

Thanks, Flux.testmode!(bert_model) is really working.

Do you have any thoughts about integrating of BERT embedding into Embeddings.jl ? Or, at least, adding of the mentioned above use case into the documentation/samples or into README.md?

chengchingwen commented 5 years ago

Currently I don't have any idea on how they could fit together, but I think defining a similar API is possible.

chengchingwen commented 4 years ago

With the new Flux AD backend (Zygote), the dropout is default inactive, so there is no need for testmode! anymore with the newest version.

chengchingwen / Transformers.jl

bert_model.embed generates different vectors for same inputs #5