Suggest changing the default model to a larger context one(such as llama 2)

Bug behavior

When using the described process in the README.md, I bumped into this error:

[(ID:7yTvdp) Monday February 13 -- 09:00 PM] Activity: Isabella is back at Hobbs Cafe working on
[(ID:WL1ZqV) Monday February 13 -- 09:00 PM] Activity: Isabella is ERROR: The prompt size exceeds the context window size and cannot be processed
[(ID:Jv0qLd) Monday February 13 -- 10:00 PM] Activity: Isabella is ERROR: The prompt size exceeds the context window size and cannot be processed

Here the originally intended hourly breakdown of Isabella's schedule today: 1) wake up and complete the morning routine at 6:00 am, 2) go for a walk by the river
[(ID:D0IHgT) Monday February 13 -- 11:00 PM] Activity: Isabella is

~~~ output    ----------------------------------------------------
ERROR: The prompt size exceeds the context window size and cannot be processed

=== END ==========================================================

Analysis

The context window of all llama model is 2048, as stated in the begining of python reverie.py log n_ctx parameter:

llama.cpp: using Metal
llama.cpp: loading model from /Users/mine/.cache/gpt4all/orca-mini-3b.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 3200
llama_model_load_internal: n_mult     = 240
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 26
llama_model_load_internal: n_rot      = 100
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 8640
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 3B
llama_model_load_internal: ggml ctx size =    0.06 MB
llama_model_load_internal: mem required  = 2862.72 MB (+  682.00 MB per state)
llama_new_context_with_model: kv self size  =  650.00 MB

Of course, that is because LLama 1 is trained with 2k context length.

Suggestion

Change the default model in reverie/backend_server/utils.py from orca-mini to llama-2, as llama-2 has context window of 4k.

# Select the GPT4All Model you'll use for the simulation. See: https://observablehq.com/@simonw/gpt4all-models
# gpt4all_model="orca-mini-3b.ggmlv3.q4_0.bin"
gpt4all_model="llama-2-7b-chat.ggmlv3.q4_0.bin"  # use llama-2 instead
max_tokens = 30
temperature = 0.5

Supplement: Only changing the model to llama2 does not solve the problem

When I changed the gpt4all_model to llama2, it does not solve the context problem, the context window is still 2048:

llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 5407.71 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  = 1024.00 MB

Update

This seems like a problem in ggml, which I do not find a solution yet. Issue from gpt4all repo also reflect same unsolved status: https://github.com/nomic-ai/gpt4all/issues/664#issuecomment-1556233279

SaturnCassini / gpt4all_generative_agents