datacommonsorg / website

Code for the Data Commons website
https://datacommons.org
Apache License 2.0
21 stars 74 forks source link

Delete deprecated build embeddings tool: autogen input, finetuning #4321

Closed shifucun closed 1 month ago

shifucun commented 1 month ago

With the use of new models, some old utilities for building embeddings are not needed anymore.

  1. The autogen input is deprecated as we emphasize more on the quality of stat var and topics.

  2. Finetuning was specifically for traditional transformer model all-minil-lm. The process does not apply to larger model like uae-large or even larger llm models.