I have some questions - Githubissues

Hi! I think that you now can send me messages on twitter https://twitter.com/CStanKonrad, however, I may be unavailable during the weekend.

I have recently checked and the code below works with

accelerate==0.23.0
transformers==4.33.2
torch==2.0.1+cu117

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("syzymon/long_llama_code_7b_instruct")
model = AutoModelForCausalLM.from_pretrained("syzymon/long_llama_code_7b_instruct", torch_dtype=torch.bfloat16, trust_remote_code=True)

device=torch.device("cuda")
model.to(device)

prompt = "My name is Julien and I like to"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
generation_output = model.generate(
    input_ids=input_ids.to(device),
    max_new_tokens=256,
    num_beams=1,
    do_sample=True,
    temperature=1.0,
)
print(tokenizer.decode(generation_output[0]))

result:

My name is Julien and I like to play with computers and learn new technologies. 😊 I am interested in programming and have been coding for a few years now. I am currently studying Data Science and Machine Learning at EPFL University in Switzerland. I am also a member of the STEM club where we organize workshops and activities related to STEM education and outreach.

As part of my studies, I have worked on various projects in the field of Data Science and Machine Learning. One of my projects is a recommender system that uses machine learning algorithms to predict whether an unknown user will like a certain movie or not. Another project is a language translator that uses Natural Language Processing techniques to translate text from one language to another.

In my free time, I like to explore new technologies and create personal projects. Some of my latest projects include a web interface for viewing 3D terrain data and an iOS app designed for photo tagging and organizing. I also enjoy taking part in online events and competitions where I can showcase my skills and knowledge and interact with others who share my interests.</s>

Note that the quantized version of the model may require special handling (see the Colab).

The non-instruction tuned version of the model (LongLLaMA-Code 7B) may have problems with answering questions (it may for example answer with a bunch of new lines).

If you are loading the model as LLaMA, make sure to use an up-to-date version of the transformers library (as the old one ignores the parameter rope_theta)

CStanKonrad / long_llama

I have some questions #16