llama3 does not return pure json

assafelovic / gpt-researcher

LLM based autonomous agent that does online comprehensive research on any given topic

https://gptr.dev

Apache License 2.0

14.27k stars 1.86k forks source link

llama3 does not return pure json #521

Closed barsuna closed 3 months ago

barsuna commented 4 months ago

Testing gpt-researcher with llama3, i found that 3 times out of 4 llama3 will respond with json + some verbiage to prompt in generate_search_queries_prompt.

Not sure it is worth changing the prompt for sake of llama3 alone, but for documentation purposes here is the updated prompt that seems to work every time

before:

f'You must respond with a list of strings in the following format: ["query 1", "query 2", "query 3"].'

after

f'Your response must include list of the query strings in json format and nothing else. For example: ["query 1", "query 2", "query 3"]'

Dilip-17 commented 4 months ago

Hey @barsuna. I was searching how to use llama with gpt researcher and stumbled upon this post. If possible, could you tell me how to get gpt researcher to work with llama 3?

barsuna commented 4 months ago

@Dilip-17 there was same question on another issue, i added some pointers there

https://github.com/assafelovic/gpt-researcher/issues/520

the challenge is mostly not how to run, but having the gpu memory necessary to run llama3 - even the borderline usable (imo, opinions are divided on this) 4-bit quantized 70b model takes about ~43GB, i'd recommend Q6 which is close to 60GB

assafelovic commented 4 months ago

Hey working with different LLMs (other than the default OpenAI) required extra manual tweaking. Would love to learn from your experience if you find ways to make the code more generic!

barsuna commented 4 months ago

To its credit, llama3 worked pretty much out of box with gpt-researcher (the only tweak needed was the prompt change above). It seems it is possible to stretch the context window to 16k without tuning (though i've done very limited testing of that).

So far progress with llama3 was difficult for things requiring function calling and in-prompt memory - autonomous agents, with single or 1 by 1 prompting agents things seem to be better.

Of course the main challenge remains the quality of reports, i'm currently trying to compare llama3 vs gpt4, it seems both are challenged somewhat and my belief is the likely direction to solve this is to balance automation/augmentation - let user do more if they wished.

Havent measured quality of embeddings and its impact on quality of report much either.

assafelovic commented 3 months ago

Great thank you for the feedback @barsuna ! Closing for now but feel free to open new threads if needed