different-ai / obsidian-ava

Quickly format your notes with ChatGPT in Obsidian
https://app.anotherai.co
MIT License
653 stars 18 forks source link

Ava search & fine-tuning #29

Closed louis030195 closed 1 year ago

louis030195 commented 1 year ago

TODOs:

This is important information that would influence whether we try to implement inference in JS directly (ONNX, TFJS, etc.) which makes it more difficult to fine-tune.

louis030195 commented 1 year ago

Performance Comparison In our paper TSDAE we compare approaches for sentence embedding tasks, and in GPL we compare them for semantic search tasks (given a query, find relevant passages). While the unsupervised approach achieve acceptable performances for sentence embedding tasks, they perform poorly for semantic search tasks.

https://www.sbert.net/examples/unsupervised_learning/README.html#performance-comparison

louis030195 commented 1 year ago

There is also a possibility of computing weak labels through existing links / tags / closeness of notes as for supervised fine-tuning

louis030195 commented 1 year ago

We could also provide a huggingface hub fine-tuned sentence embedding model good for general purpose Obsidian vault semantic search (or multimodal)

louis030195 commented 1 year ago

Idea: make a script that each individual can run on its vault to aggregate its public vault data into a huggingface dataset (with some args like filter in/out only what is publicly shareable, like publish: true or some tag/folder) Then we can fine-tune an Obsidian note embedding model

arminta7 commented 1 year ago

How hard would it be to add something like this: https://twitter.com/rileytomasek/status/1603854647575384067?s=46&t=if935fDFIydWWmNtFn-R4g

louis030195 commented 1 year ago

How hard would it be to add something like this: https://twitter.com/rileytomasek/status/1603854647575384067?s=46&t=if935fDFIydWWmNtFn-R4g

@arminta7 thanks for the feedback :). It is indeed on the roadmap