Closed styada closed 2 days ago
Working on a solution right now. Looks like since arXiv is a publicly available API they have a gentleman's agreement for everyone using the API to space out requests with a 3 second delay to "play nice"
might be a bit rudimentary but is there some way to speed up llm inference times? not sure if it is the model I chose or because I am running locally but the program is mighty slow. Any suggestions to run the program efficiently would help speed up the final tests on the new arxiv search provider I built.
Use cloud API credits or get a faster computer are your only options unfortunately
might be a bit rudimentary but is there some way to speed up llm inference times? not sure if it is the model I chose or because I am running locally but the program is mighty slow. Any suggestions to run the program efficiently would help speed up the final tests on the new arxiv search provider I built.
Check your usage of your CPU and GPU while running the program. Any local LLM that starts using CPU (or exclusively uses CPU) will be many times slower than what you are expecting. If you have a decent GPU you can try using a smaller Ollama model like Llama3.2 or smaller.
I ran it on my other computer but the content scraping for the relevant pages is not working on arXiv. (retrieves some random filler sentence)
Could you give an example? As of now wouldn't it being scraping what shows up at the top of the page first? Like this?
Or would it be getting a sentence from in here:
Sorry if the print is small but it repeatedly scrapes:
================================================================================
Research Focus: Hardware Optimization Techniques
Source: http://arxiv.org/abs/2209.03807v2
Content:
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
================================================================================
I think it might be some header content and what not. I am able to get the summary section through their API already so that's not too much of an issue just side-stepping the relevant pages section efficiently without downgrading the process is the goal. I have my other computer running my fixed version to check it out rn will probably take a couple of hours since I'm training a model on the side.
Will let y'all know the results of the test and see if I can farm some suggestions if it doesn't work as expected.
We are in partially business
`Research Focus: Algorithm optimization for language models (LLMs)
Source: http://arxiv.org/abs/2407.14112v1
Content:
Large language models (LLMs) have recently demonstrated state-of-the-art
performance across various natural language processing (NLP) tasks, achieving
near-human levels in multiple language understanding challenges and aligning
closely with the core principles of semantic communication. Inspired by LLMs'
advancements in semantic processing, we propose an innovative LLM-enabled
semantic communication system framework, named LLM-SC, that applies LLMs
directly to the physical layer coding and decod
================================================================================
================================================================================
Research Focus: Algorithm optimization for language models (LLMs)
Source: http://arxiv.org/abs/2404.11531v1
Content:
Fusing knowledge from multiple Large Language Models (LLMs) can combine their
diverse strengths to achieve improved performance on a given task. However,
current fusion approaches either rely on learning-based fusers that do not
generalize to new LLMs, or do not take into account how well each LLM
understands the input. In this work, we study LLM fusion at test-time, which
enables leveraging knowledge from arbitrary user-specified LLMs during
inference. We introduce Pack of LLMs (PackLLM), an ef
================================================================================
================================================================================
Research Focus: Algorithm optimization for language models (LLMs)
Source: http://arxiv.org/abs/2410.18136v1
Content:
Designing functional transition metal complexes (TMCs) faces challenges due
to the vast search space of metals and ligands, requiring efficient
optimization strategies. Traditional genetic algorithms (GAs) are commonly
used, employing random mutations and crossovers driven by explicit mathematical
objectives to explore this space. Transferring knowledge between different GA
tasks, however, is difficult. We integrate large language models (LLMs) into
the evolutionary optimization framework (LLM-E
================================================================================
================================================================================
Research Focus: Algorithm optimization for language models (LLMs)
Source: http://arxiv.org/abs/2406.10675v1
Content:
Large Language Models (LLMs) have achieved significant progress across
various fields and have exhibited strong potential in evolutionary computation,
such as generating new solutions and automating algorithm design.
Surrogate-assisted selection is a core step in evolutionary algorithms to solve
expensive optimization problems by reducing the number of real evaluations.
Traditionally, this has relied on conventional machine learning methods,
leveraging historical evaluated evaluations to predict`
My only concern now is am I fetching relevant results but that depends on what new search queries the LLM creates
At work but I will create a PR when I can (Also have we considered incorporating Async calls where possible?) I know we have frequent LLM calls and that's the whole point so we can't change that but maybe we can potentially speed up inference latency times with parallel async calls. The async generate methods are not too difficult to create. I would think the goal is to be able to run the program on any regular computer that can run ollama serve.
Hey if you want to incorporate this stuff suggest an update to the code of the feature/multi-api-search branch, and implement it! use a pull request and i'll test it and merge it with that branch, should it seem viable!
This is a great idea