Ava search & fine-tuning

different-ai / obsidian-ava

Quickly format your notes with ChatGPT in Obsidian

https://app.anotherai.co

MIT License

653 stars 18 forks source link

Ava search & fine-tuning #29

Closed louis030195 closed 1 year ago

louis030195 commented 1 year ago

TODOs:

[ ] metrics / evaluation to know if fine-tuning is worthwhile
[ ] check if there is an easy way to fine-tune with supervised data collected through user interaction

This is important information that would influence whether we try to implement inference in JS directly (ONNX, TFJS, etc.) which makes it more difficult to fine-tune.

louis030195 commented 1 year ago

Performance Comparison In our paper TSDAE we compare approaches for sentence embedding tasks, and in GPL we compare them for semantic search tasks (given a query, find relevant passages). While the unsupervised approach achieve acceptable performances for sentence embedding tasks, they perform poorly for semantic search tasks.

https://www.sbert.net/examples/unsupervised_learning/README.html#performance-comparison

louis030195 commented 1 year ago

There is also a possibility of computing weak labels through existing links / tags / closeness of notes as for supervised fine-tuning

louis030195 commented 1 year ago

We could also provide a huggingface hub fine-tuned sentence embedding model good for general purpose Obsidian vault semantic search (or multimodal)

louis030195 commented 1 year ago

Idea: make a script that each individual can run on its vault to aggregate its public vault data into a huggingface dataset (with some args like filter in/out only what is publicly shareable, like publish: true or some tag/folder) Then we can fine-tune an Obsidian note embedding model

arminta7 commented 1 year ago

How hard would it be to add something like this: https://twitter.com/rileytomasek/status/1603854647575384067?s=46&t=if935fDFIydWWmNtFn-R4g

louis030195 commented 1 year ago

How hard would it be to add something like this: https://twitter.com/rileytomasek/status/1603854647575384067?s=46&t=if935fDFIydWWmNtFn-R4g

@arminta7 thanks for the feedback :). It is indeed on the roadmap