adrienpoly / rubyvideo

Indexing all Ruby related videos
https://rubyvideo.dev
196 stars 30 forks source link

Display videos on similar topics? #18

Open crohr opened 1 year ago

crohr commented 1 year ago

Hi @adrienpoly, when viewing a video, it might be interesting to have links to other videos on the same topic(s). Would you merge a PR that brings this feature?

adrienpoly commented 1 year ago

Sure that would be lovely. the current list of video is random just to have something. In #3 I mentioned the ability to filter by tags but there is no tagging system in place yet. Tags could be used for defining the same topics but they would also need to be built! There is a very preliminary system in place where user can edit talks/speakers so that we could get user generated content. Another option would be to get a transcript of the video and run some ChatGPT to extract tags too

Anyway whatever would be a step towards having better suggestions is warmly welcome

crohr commented 1 year ago

I was thinking of implementing similarity search with pgvector based on the description (and possibly the transcripts of the videos yes), but it seems like you're using sqlite as the db, and meilisearch for search, and I don't think either of those support vector columns. Would you be open to switch to postgres instead of sqlite?

adrienpoly commented 1 year ago

One of my side goals (for a side project that makes a lot of side things) is to see how far we can go with an SQLite database. What are the real blockers and what are the benefits we get from such a simple stack. I am documenting this and will either present a talk on it somewhere or write articles.

Therefore I don't want to switch to Postgresql at least now.

For vector search, there is this experimental feature from Meillisearch that was just released https://github.com/meilisearch/product/discussions/621#discussioncomment-6183647

Sqlite also has this extension https://observablehq.com/@asg017/introducing-sqlite-vss

crohr commented 1 year ago

I can relate, my latest side project also uses sqlite and a simple stack to deploy (no mrsk yet but simple docker-compose + remote docker context).

I'll have a look at both solutions, thanks for the pointers!

crohr commented 1 year ago

Had a quick stab at it with meilisearch, but I can't seem to send a vector with 1536 floats (default size of OpenAI ada-002 model). Waiting for a reply on their side.