what are default options so that alpaca is comparable to davinci-003?

alperyilmaz commented 1 year ago

Hi, Huge thanks for bringing this repo to us, making it much easier to try large language models for us. I ran the commands and everything worked great. However, I was disappointed with results using Alpaca since Stanford group indicated that student authors tested Alpaca against davinci-003 and win ratio was nearly %50.

The results I am getting are terrible. So I was wondering what are the ideal parameter values. I only played with temperature.

Alpaca demo site is down at the moment so I used the example that was showcased in the blog post (shown below). This example is not the only case, I tried other prompts and almost all of them generated bad results.

original result

Below are the commands I have used and Alpaca results.

dalai/alpaca/main --seed -1 --threads 4 --n_predict 200 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.5 --repeat_last_n 64 --repeat_penalty 1.3 -p "write an e-mail  to congratulate new Stanford admits and mention that you are excited about meeting all of them in person.
"

Output: Describe your favorite classroom activity or experience from school, college/university level (highschool is okay too).

--temp 0.8   # (showing only the updated parameter value)

Output: This is a great opportunity to share your thoughts with the admissions committee, but don't be too lengthy or elaborate on this topic; keep it simple yet concise (150-250 words).

--temp 0.1  # (showing only the updated parameter value)

Output: Write a letter to your friend who is going through hard times, expressing sympathy for their situation but also encouragement to keep moving forward with life.

For comparison, below is the davinci-003 (not chatGPT, the davinci-003 API itself) output:

Subject: Congratulations to the new Stanford Admits!

Dear Stanford Admits,

Congratulations on your admission to Stanford! I'm sure you must be overjoyed and relieved at the same time.

It's an amazing feeling when all your hard work pays off, isn't it? I'm sure you'll make the most of this incredible opportunity and shine in the years to come.

I'm also excited to meet all of you in person. I'm sure that you will bring a lot of life and energy to the campus and I can't wait to be part of it.

Wishing you all the best for your future.

Sincerely,
[Your Name]

another prompt

Let me provide one more example. I saw this example in a video, and it gave good result in Alpaca demo site in the video. Prompt is "are all panthers black?", temp=0.1, output:

Asked by katiekat123 at 5:48 PM on Aug. 7, 2010 in Just for Fun

Same prompt, temp=0.8, output:

Asked by Dustin B #542086 on 10/31/2017 9:53 AM

keldenl commented 1 year ago

Dude I was literally scratching my head too about this, but I found a workaround for now: https://github.com/cocktailpeanut/dalai/issues/126

The issue is that dalai doesn't support "interactive" mode in a sense that gives an instruction like response. See my workaround and suggestion ^^. I FINALLY got it working somewhat decently. Please lmk if you find even better improvements

keldenl commented 1 year ago

i used your last prompt as an example

Prompt: are all panthers black?
Response: No, not all Panther species are exclusively Black in coloration; some have white or brown coats as well

I believe it replies Asked by Dustin B #542086 on 10/31/2017 9:53 AM because it's trying to "autocomplete" your request

ShibeTemple commented 1 year ago

Interested about this as well, haven't been able to reproduce the same example outputs like shown in the demos.

lisa-ee commented 1 year ago

Interesting - so they're a Stanford alum AND a current high school student - I mean, it's impressive!

bachi76 commented 1 year ago

You could switch the prompt to

"The following is an e-mail to congratulate ..."

This already produces a nice output.

HoneyCodeBadger commented 1 year ago

@alperyilmaz if I check your comment I have the feeling you haven't really understood the meaning of the parameters you can play with. Esp. with a quite low temp. I would expect such results as the variable give's the model the ability beeing more creative or beeing more strict.

main --seed 65006 --threads 12 --n_predict 250 --model models/30B/ggml-model-q4_0.bin --top_k 120 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p ">Write two tweets about Olaf Scholz has the plan rebuilding the wall \nin Germany again in German language. One with a positive attitude and \none with a negative attitude towards it. The one with a positive \nattitude should be written by someone who is open to conspiracy \ntheories. The one with a negative attitude should be written by someone \nwho is not open to conspiracy theories.\n" in C:\Users\User\dalai\alpaca

results in:

German language:

Tweet 1 (positive): Ihr solltet Olaf Scholz für seine Idee die Mauer wieder aufbauen im deutschen Land belobigen! Es ist wichtig, dass unsere Kultur und Identität geschützt werden. 

Tweet 2 (negative) : Ich kann mich nicht mit der Ansicht von Olf Scholz über das Wiederaufbauen des Mauers in Deutschland einverstanden erklären. Wir dürfen keine Trennung zwischen den Menschen aufrechterhalten!

Translated:

Tweet 1 (positive): You should commend Olaf Scholz for his idea to rebuild the wall in the German country! It is important to protect our culture and identity.

Tweet 2 (negative) : I can't agree with Olf Scholz's view on rebuilding the wall in Germany. We must not maintain a separation between people!

bachi76 commented 1 year ago

..have the feeling you haven't really understood the meaning of the parameters you can play with.

@HoneyCodeBadger please forgive my even more ignorant question here - is there somewhere a good documentation of those parameters? Couldn't find one easily (being not into the actual research).

PS: Your example above is scary! :-)

HoneyCodeBadger commented 1 year ago

@bachi76 I don't have a docu to share yet. But there should be something on the web. But let me try to help:

These variables are often used in Natural Language Processing (NLP) to control various parameters of text generation. Here's a brief explanation of each variable:

n_predict: The number of next words the model should predict.

repeat_last_n: The number of last generated words to check to make sure they are not repeated.

repeat_penalty: A factor that indicates how strongly the model is penalized for repeating a word.

top_k: The number of top suggestions the model can choose from.

top_p: A threshold for the model's cumulative distribution function. Words with a probability lower than the threshold are excluded.

temp: A factor that controls the "creativity" of the model. Higher temperatures result in more random and surprising suggestions.

seed: A starting sequence given to the model to begin the generation.

threads: The number of threads the model should use for generation.

model: The model to use for generation, such as a neural network or a Transformer model.

alperyilmaz commented 1 year ago

@HoneyCodeBadger I was trying to show that no matter what I change temp value to, the results do not match. Since Stanford demo didn't include temp value.

bachi76 commented 1 year ago

Thanks a lot for the quick help @HoneyCodeBadger !

cocktailpeanut / dalai

what are default options so that alpaca is comparable to davinci-003? #104

another prompt