issues
search
PygmalionAI
/
data-toolbox
Our data munging code.
GNU Affero General Public License v3.0
34
stars
9
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
feat: add normalization script
#41
AlpinDale
closed
9 months ago
1
Fix the random sys prompt generator
#40
TearGosling
closed
11 months ago
0
Add SuperCOT dataset
#39
TearGosling
closed
11 months ago
0
Add ChatML format
#38
TearGosling
closed
1 year ago
0
Add Airoboros 2 dataset and task
#37
TearGosling
closed
1 year ago
1
Claude Evol-Instruct dataset and task
#36
TearGosling
closed
1 year ago
1
Diversify system prompts
#35
TearGosling
closed
1 year ago
2
Add Roleplayer Guild data
#34
TearGosling
closed
1 year ago
0
Refactor training example formats and add Alpaca format
#33
TearGosling
closed
1 year ago
0
Add back and forth exchanges in RP forums
#32
TearGosling
closed
1 year ago
0
[FixBug] Update rp_forums.py to avlid yeilding the first msg as a thread
#31
silverriver
closed
1 year ago
2
Always yield the first messge of each thread as an independent thread
#30
silverriver
closed
1 year ago
1
Add old Pygmalion format as a compilation option
#29
TearGosling
closed
1 year ago
0
More 'Guess the Instruction' datasets
#28
TearGosling
closed
1 year ago
0
Fix Claude roleplay data
#27
TearGosling
closed
1 year ago
3
Add Dolly dataset and "guess the instruction" task
#26
TearGosling
closed
1 year ago
1
Add Claude instruct format
#25
TearGosling
closed
1 year ago
1
LIMARP dataset, first commit
#24
TearGosling
closed
1 year ago
1
User-submitted Claude logs
#23
TearGosling
closed
1 year ago
1
OpenOrca dataset
#22
TearGosling
closed
1 year ago
1
Implement Airoboros dataset
#21
TearGosling
closed
1 year ago
0
multilingual
#20
g3434343
closed
1 year ago
3
Curtail curtailing harmful outputs
#19
TearGosling
closed
1 year ago
1
OpenAssistant instruction data
#18
TearGosling
closed
1 year ago
2
Two additional datasets/tasks for the toolbox
#17
TearGosling
closed
1 year ago
2
Clean up ShareGPT dataset
#16
TearGosling
closed
1 year ago
1
Implement GPT4All dataset
#15
TearGosling
closed
1 year ago
1
Implement roleplay data
#14
TearGosling
closed
1 year ago
0
Publish/Share the share_gpt.json file
#13
manyoso
closed
1 year ago
1
Investigate Chain of Hindsight for fine-tuning
#12
0x000011b
closed
1 year ago
2
Generate synthetic negative data
#11
0x000011b
closed
1 year ago
1
Adding Github Actions Pipeline
#10
Silver-f0x
opened
1 year ago
0
Add enjim dataset(s)
#9
lloorree
opened
1 year ago
0
First draft of visual novel PDM
#8
TearGosling
closed
1 year ago
1
in progress work on #4 adding enjim datasets
#7
lloorree
closed
1 year ago
0
Integrate chain of thought dataset code
#6
TearGosling
closed
1 year ago
1
Investigate and possibly include some ParlAI data
#5
0x000011b
closed
1 year ago
3
Implement data handling for RP forum dumps
#4
0x000011b
closed
1 year ago
4
Implement VN + VNDB data handling
#3
0x000011b
closed
1 year ago
3
Improve training data
#2
0x000011b
closed
1 year ago
8
Very first prototype of SODA dataset support
#1
TearGosling
closed
1 year ago
0