farach / huggingfaceR

Hugging Face state-of-the-art models in R
Other
141 stars 17 forks source link

Update sentence-transformers.R #21

Open tomazweiss opened 2 years ago

tomazweiss commented 2 years ago

Previous example didn't work.

samterfa commented 2 years ago

We can definitely change course, but the original example worked and seemed clear to me. ` library(tidyverse)

Compute sentence embeddings

sentences <- c("Baby turtles are so cute!", "He walks as slowly as a turtle.","The lake is cold today.", "I enjoy swimming in the lake.") model <- hf_load_sentence_model('paraphrase-MiniLM-L6-v2') embeddings <- model$encode(sentences) embeddings

Get distances between sentences

embeddings %>% dist() %>% as.matrix() %>% as.data.frame() %>% setNames(sentences) %>% mutate(sentence 1 = sentences) %>% pivot_longer(cols = -sentence 1, names_to = 'sentence 2', values_to = 'distance') %>% filter(distance > 0)

Cluster sentences

embeddings %>% t() %>% prcomp() %>% pluck('rotation') %>% as.data.frame() %>% mutate(sentence = sentences) %>% ggplot(aes(PC1, PC2)) + geom_label(aes(PC1, PC2, label = sentence, vjust="inward", hjust="inward")) + theme_minimal() `

farach commented 2 years ago

I ran @samterfa code above and was successful after adding back ticks to "sentence 1". This matches what we have in the example so it should be good to go.

`library(tidyverse)

sentences <- c( "Baby turtles are so cute!", "He walks as slowly as a turtle.", "The lake is cold today.", "I enjoy swimming in the lake." )

model <- hf_load_sentence_model('paraphrase-MiniLM-L6-v2')

embeddings <- model$encode(sentences) embeddings

embeddings %>% dist() %>% as.matrix() %>% as.data.frame() %>% setNames(sentences) %>% mutate(sentence 1 = sentences) %>% pivot_longer( cols = -sentence 1, names_to = 'sentence 2', values_to = 'distance' ) %>% filter(distance > 0)

embeddings %>% t() %>% prcomp() %>% pluck('rotation') %>% as.data.frame() %>% mutate(sentence = sentences) %>% ggplot(aes(PC1, PC2)) + geom_label(aes(PC1, PC2, label = sentence, vjust="inward", hjust="inward")) + theme_minimal()`

@tomazweiss example is really close to this to. @tomazweiss could you point us to the error you are getting?

samterfa commented 2 years ago

It looks like @jpcompartir changed the example here. Maybe he was seeing an error?

tomazweiss commented 2 years ago

@farach, I was correcting example in this file: https://github.com/farach/huggingfaceR/blob/main/R/sentence-transformers.R , which is different from what @samterfa is pasting above.

There is a typo (embddings) and you are updating the embeddings object and then using the previous version in plot.

jpcompartir commented 2 years ago

It looks like @jpcompartir changed the example here. Maybe he was seeing an error?

This just looks like me being careless - by the by - the examples (across the package) were temporarily removed to speed up running check() (also didn't feel like adding to buildignore) - they're likely to be put back in as usage, or to figure in vignettes. So I may do the same here, until we're ready to go with release and tests etc. have been added appropriately.