biotorrents / gazelle

BioTorrents.de’s version of Gazelle
https://torrents.bio
ISC License
19 stars 3 forks source link

Long term: bring the AI integration in house #80

Open pjc09h opened 1 year ago

pjc09h commented 1 year ago

I've been experimenting with Facebook Galactica and it seems entirely possible to at least generate quality "recommended research" suggestions with it, validating its output against the Semantic Scholar API to resolve a set of relevant open access papers for a torrent group title or an array of keywords.

OpenAI will likely remain useful for the foreseeable future because the text-davinci-003 model excels at producing fluent summarization of arbitrary lengths (our implementation specifies 100 words), while Galactica's TLDR token only seems to produce a single sentence comparable to a Semantic Scholar entry.

The planned implementation is to expose the standard 6.7b model through a simple Flask API that can enqueue and retrieve jobs. This may be proxied through the Gazelle API for arbitrary donor use, as it does require a significant hardware investment.

I don't actually own a GPU, but was thinking that either dual RTX A4000's, a single RTX 4000 SFF Ada Generation, or a single GeForce RTX 3090 Turbo would suffice.