Method not allowed : curl -X GET "https://sharegpt.com/api/conversations?type=new&page=1&search=python"

opvexe commented 1 year ago

GET /api/conversations (for fetching conversations) Can you open it?

Jiahyemail commented 1 year ago

I also encountered the same problem, did you solve it?

chinoll commented 1 year ago

I encountered the same problem.

Lisennlp commented 1 year ago

I alse encountered the same problem. who do know to slove it?

Lyken17 commented 1 year ago

Seems related to news that Google's Bard is using ShareGPT's data to train

https://www.theverge.com/2023/3/29/23662621/google-bard-chatgpt-sharegpt-training-denies

Though not sure how true the news is, current sharegpt does not have a place to "share" unless you directly paste the link to other social medias like Twitter.

domeccleston commented 1 year ago

Can you please say more about what you're trying to achieve here with this API? You're correct that this endpoint is disabled

Lisennlp commented 1 year ago

Can you please say more about what you're trying to achieve here with this API? You're correct that this endpoint is disabled

I am a student and want to call api to get some data for my research.

opvexe commented 1 year ago

I am a student and want to call api to get some data for my research.

Can you please say more about what you're trying to achieve here with this API? You're correct that this endpoint is disabled

I am a student and want to call api to get some data for my research.

Ejafa commented 1 year ago

@domeccleston It's a new trend now. Download 2/3 synthetic dataset + train on Llama = DIY CHADGPT

genggui001 commented 1 year ago

@domeccleston It's a new trend now. Download 2/3 synthetic dataset + train on Llama = DIY CHADGPT

Yes, these data can greatly help us ordinary people to achieve ChatGPT.

opvexe commented 1 year ago

@domeccleston can open it ?

domeccleston commented 1 year ago

It's a new trend now. Download 2/3 synthetic dataset + train on Llama = DIY CHADGPT

Can you link me a guide that walks me through the exact steps to do this?

I need more context here. Please link me something or email domeccleston@gmail.com.

domeccleston commented 1 year ago

Can't promise anything, but if you help me understand why this data is valuable to you, I can evaluate.

chinoll commented 1 year ago

Can't promise anything, but if you help me understand why this data is valuable to you, I can evaluate.

These data can help us train ChatGPT on our own devices, which will facilitate the democratization of AI. can open it ?

genggui001 commented 1 year ago

Can't promise anything, but if you help me understand why this data is valuable to you, I can evaluate.

These data can help us turn Close AI into a real Open AI.

Ejafa commented 1 year ago

@domeccleston let me give you some of the resources for my thesis. https://vicuna.lmsys.org/ , https://github.com/nomic-ai/gpt4all, https://crfm.stanford.edu/2023/03/13/alpaca.html , we are all trying to test these researches and do evaluation

Lisennlp commented 1 year ago

@domeccleston We really need this data, otherwise it will only make AI be not open

ari9dam commented 1 year ago

+1 This data is really valuable. If you could host a data dump that will be really helpful.

opvexe commented 1 year ago

@domeccleston +1 This data is really valuable. If you could host a data dump that will be really helpful.

timatom commented 1 year ago

@Lisennlp, @ari9dam, @shumintao, @Ejafa, @genggui001, @chinoll

I understand there may be reasons for ShareGPT to close the endpoint. But for those looking for what appears to be a similar datasets, GPT4all has an extended training set under an Apache license. From what I can tell, it looks like it contains some prompts from GPT-3 (maybe GPT-3-Turbo). You may just have to preprocess it using a similar approach mentioned under GPT4ALL's technical report before training your model (fresh LLaMa or Vicuna model, for example).

GPT4ALL: https://github.com/nomic-ai/gpt4all Technical report: https://s3.amazonaws.com/static.nomic.ai/gpt4all/2023_GPT4All_Technical_Report.pdf GPT4ALL extended training dataset: https://huggingface.co/datasets/nomic-ai/gpt4all_prompt_generations_with_p3

ari9dam commented 1 year ago

YI, There is a difference between GPT4all, Alpacca datasets and ShareGPT. ShareGPT is a "multi-turn" dialogue dataset, generated from diverse users. While others are one "single-interaction" between Human and GPT.

genggui001 commented 1 year ago

@Lisennlp, @ari9dam, @shumintao, @Ejafa, @genggui001, @chinoll

I understand there may be reasons for ShareGPT to close the endpoint. But for those looking for what appears to be a similar datasets, GPT4all has an extended training set under an Apache license. From what I can tell, it looks like it contains some prompts from GPT-3 (maybe GPT-3-Turbo). You may just have to preprocess it using a similar approach mentioned under GPT4ALL's technical report before training your model (fresh LLaMa or Vicuna model, for example).

GPT4ALL: https://github.com/nomic-ai/gpt4all Technical report: https://s3.amazonaws.com/static.nomic.ai/gpt4all/2023_GPT4All_Technical_Report.pdf GPT4ALL extended training dataset: https://huggingface.co/datasets/nomic-ai/gpt4all_prompt_generations_with_p3

Only ShareGPT's data is multi-round and has real interaction with humans. If you really can't enable the api, sharing a data dump is also great.

ari9dam commented 1 year ago

Sharing the data dump is actually better.

timatom commented 1 year ago

I agree, a data dump would be the best alternative of all this. It's ultimately up to the creators of ShareGPT I suppose, as there may be security concerns (not all who shared conversations may have done so publicly).

If not, the next step would be for others to create a cloned version of ShareGPT and give it some time to grow similar in size. It does suck that some get access to this type of public data, only for it to no longer be public after release. It will further make others more secretive of their data practices.

Syphixs commented 1 year ago

+1 I also think that this data would greatly help improve real open source models.

robinzixuan commented 1 year ago

I agree with that. I believe this is the only prompt data used for ChatGPT

victorleeasu commented 1 year ago

Hi, also a researcher here. If the data of this repo could be shared such as in the form of a data dump, it will greatly contribute to the understanding of how contemporary people interact with generative AIs. For example, What their primary goals are and what kind of problems they run into when interacting with AIs. The known problems of hallucination and stigmatized answers pose threat to development of AI models. However, this type of knowledge is proprietary to OpenAI.

Jeffwan commented 1 year ago

@ari9dam May I ask a question. Even the shareGPT data is multi-turn, when we convert it into data that could be finetuned. We still need to change it to instruction/input/output format which means input is a sentence and output is a sentence. Is there a good way to train dialogs?

Here's the example of what I mean

user: I want you to act as a resume editor. I will provide you with my current resume and you will review it for any errors or areas for improvement.....
gpt4: Sure, I'd be happy to help you with your resume! Please send me the resume so I can begin reviewing it.
user:  xxx University of California, Berkeley xxxCo-Founder, Markit.ai Co.      xxx
gpt4: Here are my suggestions for your resume: xxxGeneral:
user: xxxx
gpt: yyyy

In order to make it trainable by alpaca etc. we need to change to single turn instead of multi-turn

Instruction:  I want you to act as a resume editor. I will provide you with my current resume and you 
input: xxx University of California, Berkeley xxxCo-Founder, Markit.ai Co.      xxx
output: Here are my suggestions for your resume: xxxGeneral:

genggui001 commented 1 year ago

https://huggingface.co/datasets/RyokoAI/ShareGPT52K found one

Jeffwan commented 1 year ago

@genggui001 I found this one as well https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

timatom commented 1 year ago

This is quite amazing. This isn't even considering the effort by others to likely scrape twitter for ShareGPT links.

If anyone wants to take a stab at scraping this site, they have around 80k GPT conversations:

https://chatlogs.net/

It's somewhat more unstructured than ShareGPT, but it has a lot of stuff in it too.

opvexe commented 1 year ago

I thought that if I didn't open up the conversation data, I might fork a branch, make a new sharegpt, and share the data.

liyucheng09 commented 1 year ago

This is an open-source project. People share their conversations here, intending for them to be viewed by the public. The dataset is not personal property!

liyucheng09 commented 1 year ago

@shumintao Please go for it. We might easily gain supports from many research institute.

csimlinger commented 1 year ago

+1 ! The data would be amazing for different tinkering tasks.

domeccleston / sharegpt

Method not allowed : curl -X GET "https://sharegpt.com/api/conversations?type=new&page=1&search=python" #71