dawangbaixiaofu / credit

0 stars 0 forks source link

ENH: add embedding feature, tokenizer from mistral-7B #4

Open dawangbaixiaofu opened 6 months ago

dawangbaixiaofu commented 6 months ago

把特征转化为embedding的具体项目,见链接 https://github.com/mistralai/mistral-src/blob/main/tutorials/classifier.ipynb

dawangbaixiaofu commented 6 months ago

具体应用场景:

  1. 特征数据中包含有文本类型的数据
  2. 把文本字符串转换成ids,然后使用模型的embedding层前向计算得到embedding vector
  3. 把embedding vector加入到特征中作为传统machine learning model的特征