SciSharp / TensorFlow.NET

.NET Standard bindings for Google's TensorFlow for developing, training and deploying Machine Learning models in C# and F#.
https://scisharp.github.io/tensorflow-net-docs
Apache License 2.0
3.2k stars 514 forks source link

[Feature Request]: Does tensorflow text supported ? #1093

Open RockNHawk opened 1 year ago

RockNHawk commented 1 year ago

Background and Feature Description

The universal-sentence-encoder model can generate text embeddings, and it depends on TensorFlow Text. Is TensorFlow Text supported?

https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3

API Definition and Usage


  import tensorflow_hub as hub
  import numpy as np
  import tensorflow_text

  # Some texts of different lengths.
  english_sentences = ["dog", "Puppies are nice.", "I enjoy taking long walks along the beach with my dog."]
  italian_sentences = ["cane", "I cuccioli sono carini.", "Mi piace fare lunghe passeggiate lungo la spiaggia con il mio cane."]
  japanese_sentences = ["犬", "子犬はいいです", "私は犬と一緒にビーチを散歩するのが好きです"]
  chinese_sentences = ["狗","小狗很好,我喜欢和我的狗一起沿着海滩散步"]

  embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")

  # Compute embeddings.
  en_result = embed(english_sentences)
  it_result = embed(italian_sentences)
  ja_result = embed(japanese_sentences)

  # Compute similarity matrix. Higher score indicates greater similarity.
  similarity_matrix_it = np.inner(en_result, it_result)
  similarity_matrix_ja = np.inner(en_result, ja_result)

Alternatives

No response

Risks

No response

AsakusaRinne commented 1 year ago

Tensorflow.text is not supported now and will be added before v1.0.0, about in 2 months. Currently LLamaSharp is an alternative, which supports using LLM to get embeddings.