Can you point to the ShareGPT filtered/cleaned data used?

THUDM / AgentTuning

AgentTuning: Enabling Generalized Agent Abilities for LLMs

https://thudm.github.io/AgentTuning/

1.36k stars 95 forks source link

Can you point to the ShareGPT filtered/cleaned data used? #50

Closed harshraj172 closed 9 months ago

harshraj172 commented 9 months ago

Hey, thank you for your great work. I was replicating the training of AgentLM and was searching for the ShareGPT data used. The paper mentions this as the data used but I cannot find the filtered/cleaned version anywhere. Can you pls tell how to get the final version of the ShareGPT data used for training AgentLM?

Btlmd commented 9 months ago

We used an internal version of ShareGPT to filter and classify the general data we use, and I'm afraid that we cannot open-source it immediately.

For cleaned and classified datasets, you may consider using ShareGPT datasets provided by OpenChat, like openchat_sharegpt_v3 and openchat_sharegpt_v4 as an alternative.