Open QiaolingChen00 opened 1 year ago
[x] 训练完成 (chen)
[x] 数据集准备 (yang, wang, gan)
[x] backgroud, biz objective, context (shen)
[x] slides 模版提供 (gan)
在这里生成自己的 API key 并复制 https://platform.openai.com/account/api-keys
打开命令行运行
pip install llama-index
pip install langchain
git clone https://github.com/irina1nik/context_data.git
新建一个python 文件 命名为 test.py
把下面的代码粘进去
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper, ServiceContext
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display
def construct_index(directory_path):
max_input_size = 4096
# set number of output tokens
num_outputs = 2000
# set maximum chunk overlap
max_chunk_overlap = 20
# set chunk size limit
chunk_size_limit = 600
# define prompt helper
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=num_outputs))
documents = SimpleDirectoryReader(directory_path).load_data()
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
index.save_to_disk('index.json')
return index
def ask_ai(): index = GPTSimpleVectorIndex.load_from_disk('index.json') while True: query = input("What do you want to ask? ") response = index.query(query) display(Markdown(f"Response: {response.response}"))
os.environ["OPENAI_API_KEY"] = input("Paste your OpenAI key here and hit enter:")
construct_index("context_data/data")
ask_ai()
4. 在命令行运行 `python test.py`
5. 根据提示输入第一步复制好的API key
6. 然后随便问一个问题 就会有返回
之前那个代码不能批量生成 用下面的代码
export OPENAI_API_KEY=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
注意这个 API key 是GanYuqing 的,记得换成自己的 不然她就没得用了python test.py
import openai
import os
import json
openai.api_key = os.environ["OPENAI_API_KEY"]
prompts = [ "请你为我生成一段描述秋天的诗句:", "给我一个关于人生哲理的句子:", "写一句简短的情话:", ]
results = [] for prompt in prompts: response = openai.Completion.create( engine="davinci", prompt=prompt, max_tokens=256, n=1, stop=None, temperature=0.7, )
# 从响应中获取 ChatGPT 生成的文本
generated_text = response.choices[0].text.strip()
# 将 prompt 和生成的文本组成一个字典并添加到 results 数组中
result = {
"instruction": prompt,
"input": "",
"output": generated_text,
}
results.append(result)
json_data = json.dumps(results, ensure_ascii=False) print(json_data.encode('utf-8').decode('utf-8'))
with open("results.json", "w", encoding="utf-8") as f: json.dump(results, f, ensure_ascii=False)
3. 其中可以在 下面的代码段换成自己的问题
prompts = [ "请你为我生成一段描述秋天的诗句:", "给我一个关于人生哲理的句子:", "写一句简短的情话:", ]
4. 这里有一些 prompt 实例, 可以试试这样生成
```python
messages=[
{"role": "system", "content": "You are a professor in the field of ["+self.key_word+"] who is good at questions asking and answering ,also good at summarizing papers using concise statements"},
{"role": "assistant", "content": "This is the title, author, link, abstract and introduction of an English document. I need your help to read and answer the following quesions: "+clip_text},
{"role": "user", "content": """
1. Mark the title of the paper (with Chinese translation)
2. list all the authors' names (use English)
3. mark the first author's affiliation (output {} translation only)
4. mark the keywords of this article (use English)
5. 用一个词描述主题是什么?并解释这个词的基本概念。(with Chinese translation)
6. 你是一个老师,并且很擅长 presentation,请你根据 5 提出的主题,生成一个 outline (with Chinese translation)
7. 你是一个助教,你需要给老师在 6 给出的 outline 讲解这篇文章。(with Chinese translation)
Follow the format of the output that follows:
1. Title: xxx\n\n
2. Authors: xxx\n\n
3. Affiliation: xxx\n\n
4. Keywords: xxx\n\n
5. xxx \n\n
6. 老师: \n\n
- (1)xxx;\n
- detail: xxx;\n
- (2)xxx;\n
- detail: xxx;\n
- (3)xxx;\n
- detail: xxx;\n
xxx \n\n
7. 助教: \n\n
- (1)xxx:
- xxx;\n
- (2)xxx:
- xxx;\n
- (3)xxx;\n
- xxx \n\n
You are a potential book purchaser who are browsing the bookstore webs, you want to ask ChatGPT some questions(give GPT certain instructions) to know better about the book, the questions/instructions should meet the following requirements: 1.Written in English with 1 or 2 sentences, allowing the use of prayers or questions. 2.Try not repeat the verbs in each instruction and maximize the diversity of the instructions. 3.There should also be variety in the tone of the instructions used. 4.The types of instructions should be diverse, such as: brainstorming, open QA, closed QA, rewrite, extract, generation, classification, chat, and summarization. 5.The GPT language model should be able to complete these instructions. For example, the instructions should not be related to audio, video, pictures, or links, because the GPT model cannot perform this operation. 6.The questions should be with good depth and related to the book's concrete contents. You could refer to certain books like Plato's The Republic.
Here are some good examples: 1.Describe the main ideas explored in the book. 2.Analyze the main characters in the book and the roles they play. 3.What is the overall structure of the book? 4.Has the book had an influence on later philosophical or literary works? 5.Are there any specific passages or dialogues within the book that are particularly notable? 6.Are there any related works by the author or other similar authors that you might suggest to someone who enjoyed this book? 7.What historical or philosophical context does the book fit into?
Please directly list 50 questions/instructions without any other explainations.
import openai
import os
import json
import pandas as pd
# 在 OpenAI 网站上创建 API key
openai.api_key =
# 准备要输入到 ChatGPT 模型的多个 prompt (每一个instruction都有intro)
instruct = pd.read_csv("/Users/wq/Desktop/plp_Dataset/Final_Instructions.txt",sep="?",header=None)
instruct = list(instruct[0])
instruct
intro = "You are the Chatbot of a book seller, a potential book purchaser want to ask you some questions to know better about the book: Republic written by Plato. Please answer these questions with good logic and languages(within 150 words), just like you are chatting with the purchaser:"
prompts = []
for i in instruct:
i = intro + " " + i
prompts.append(i)
prompts
# 遍历多个 prompt 并使用 OpenAI API 发送请求并获取响应
results = []
for prompt in prompts:
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}],
max_tokens=256,
n=1,
stop=None,
temperature=0.7,
)
# 从响应中获取 ChatGPT 生成的文本
generated_text = response["choices"][0]["message"]["content"]
# 将 prompt 和生成的文本组成一个字典并添加到 results 数组中
result = {
"instruction": prompt,
"input": "",
"output": generated_text,
}
results.append(result)
# 将 results 数组转换成 JSON 格式并打印
json_data = json.dumps(results, ensure_ascii=False)
print(json_data.encode('utf-8').decode('utf-8'))
# 保存 results 数组到 results.json 文件中
with open("results.json", "w", encoding="utf-8") as f:
json.dump(results, f, ensure_ascii=False)
refer from https://github.com/tatsu-lab/stanford_alpaca