PygmalionAI data-toolbox issues

PygmalionAI / data-toolbox

Our data munging code.

GNU Affero General Public License v3.0

34 stars 9 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

feat: add normalization script

#41 AlpinDale closed 9 months ago
1
Fix the random sys prompt generator

#40 TearGosling closed 11 months ago
0
Add SuperCOT dataset

#39 TearGosling closed 11 months ago
0
Add ChatML format

#38 TearGosling closed 1 year ago
0
Add Airoboros 2 dataset and task

#37 TearGosling closed 1 year ago
1
Claude Evol-Instruct dataset and task

#36 TearGosling closed 1 year ago
1
Diversify system prompts

#35 TearGosling closed 1 year ago
2
Add Roleplayer Guild data

#34 TearGosling closed 1 year ago
0
Refactor training example formats and add Alpaca format

#33 TearGosling closed 1 year ago
0
Add back and forth exchanges in RP forums

#32 TearGosling closed 1 year ago
0
[FixBug] Update rp_forums.py to avlid yeilding the first msg as a thread

#31 silverriver closed 1 year ago
2
Always yield the first messge of each thread as an independent thread

#30 silverriver closed 1 year ago
1
Add old Pygmalion format as a compilation option

#29 TearGosling closed 1 year ago
0
More 'Guess the Instruction' datasets

#28 TearGosling closed 1 year ago
0
Fix Claude roleplay data

#27 TearGosling closed 1 year ago
3
Add Dolly dataset and "guess the instruction" task

#26 TearGosling closed 1 year ago
1
Add Claude instruct format

#25 TearGosling closed 1 year ago
1
LIMARP dataset, first commit

#24 TearGosling closed 1 year ago
1
User-submitted Claude logs

#23 TearGosling closed 1 year ago
1
OpenOrca dataset

#22 TearGosling closed 1 year ago
1
Implement Airoboros dataset

#21 TearGosling closed 1 year ago
0
multilingual

#20 g3434343 closed 1 year ago
3
Curtail curtailing harmful outputs

#19 TearGosling closed 1 year ago
1
OpenAssistant instruction data

#18 TearGosling closed 1 year ago
2
Two additional datasets/tasks for the toolbox

#17 TearGosling closed 1 year ago
2
Clean up ShareGPT dataset

#16 TearGosling closed 1 year ago
1
Implement GPT4All dataset

#15 TearGosling closed 1 year ago
1
Implement roleplay data

#14 TearGosling closed 1 year ago
0
Publish/Share the share_gpt.json file

#13 manyoso closed 1 year ago
1
Investigate Chain of Hindsight for fine-tuning

#12 0x000011b closed 1 year ago
2
Generate synthetic negative data

#11 0x000011b closed 1 year ago
1
Adding Github Actions Pipeline

#10 Silver-f0x opened 1 year ago
0
Add enjim dataset(s)

#9 lloorree opened 1 year ago
0
First draft of visual novel PDM

#8 TearGosling closed 1 year ago
1
in progress work on #4 adding enjim datasets

#7 lloorree closed 1 year ago
0
Integrate chain of thought dataset code

#6 TearGosling closed 1 year ago
1
Investigate and possibly include some ParlAI data

#5 0x000011b closed 1 year ago
3
Implement data handling for RP forum dumps

#4 0x000011b closed 1 year ago
4
Implement VN + VNDB data handling

#3 0x000011b closed 1 year ago
3
Improve training data

#2 0x000011b closed 1 year ago
8
Very first prototype of SODA dataset support

#1 TearGosling closed 1 year ago
0