augustwester / searchthearxiv

The code powering searchthearxiv.com, a simple semantic search engine for more than 300,000 ML papers on arXiv.
https://searchthearxiv.com
GNU General Public License v3.0
112 stars 10 forks source link

Option for SPECTER2 embeddings? #3

Open japhba opened 9 months ago

japhba commented 9 months ago

Hi,

I recently started to use Semantic Scholar's SPECTER2 model to create visualisations of BibTex files. The model is specialised to scientific work, so it seems a good option for the proximity search that searchthearxiv.com offers. While there are gaps in what papers have an associated embedding in their database, its scope extends beyond the ArXiv.

I was thinking to create a service "What was that paper again?" that would

In summary, the advantages would be

It would be exciting to see this functionality integrated into SearchTheArXiv, I would be very willing to do a prototype!

augustwester commented 8 months ago

Thanks for the suggestion! However, I think this is outside the scope of searchthearxiv.com, which is intentionally low-maintenance, ultra low-cost (i.e. basically free to run), and limited to ML papers. How would searchthearxiv.com differ from Semantic Scholar if we implemented what you describe?