issues
search
bigcode-project
/
bigcode-analysis
Repository for analysis and experiments in the BigCode project.
Apache License 2.0
115
stars
20
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update README.md
#46
christiancopeland
closed
8 months ago
0
Update README.md
#45
christiancopeland
closed
10 months ago
0
Analysis notebook
#44
loubnabnl
opened
1 year ago
0
download dataset from kaggle
#43
xu3kev
opened
1 year ago
1
Pull Requests
#42
loubnabnl
opened
1 year ago
0
kaggle dataset
#41
loubnabnl
opened
1 year ago
0
Stackoverflow processing
#40
loubnabnl
opened
1 year ago
0
[WIP] textbooks filtering
#39
loubnabnl
opened
1 year ago
0
[WIP] code reviews dataset
#38
loubnabnl
opened
1 year ago
0
Add pdf of Miro board of MozFest
#37
harm-devries
closed
1 year ago
0
Chinchilla analysis
#36
harm-devries
closed
1 year ago
0
add scaling laws notebook
#35
lvwerra
closed
1 year ago
4
Data inspection
#34
harm-devries
closed
1 year ago
0
add github issues analysis notebook
#33
loubnabnl
closed
1 year ago
0
Add unimax exploration notebook
#32
harm-devries
closed
1 year ago
1
Issues language identifier
#31
Muhtasham
closed
1 year ago
0
Minhash Improvement
#30
ChenghaoMou
closed
1 year ago
1
add kenlm experiment
#29
lvwerra
closed
1 year ago
0
update readmes of filtering methods
#28
loubnabnl
closed
1 year ago
0
add code preprocessing and comment to code notebook
#27
loubnabnl
closed
2 years ago
0
Email regex modified
#26
paulovn
closed
2 years ago
0
add PII detection pipeline and analysis notebooks
#25
loubnabnl
closed
2 years ago
0
Use detect-secrets to scan secrets (WIP)
#24
liyongsea
closed
2 years ago
2
MQA experiments on AWS SageMaker Lab
#22
ocramz
closed
2 years ago
4
requirements uses the right branch of transformers
#21
ocramz
closed
2 years ago
0
cannot import AttentionType from gpt2
#20
ocramz
closed
2 years ago
0
[Decontamination] Add readme and instructions to run substring decontamination
#19
RaymondLi0
closed
1 year ago
1
update readme and requirements
#18
ChenghaoMou
closed
2 years ago
0
Reorganize data analysis folder and update readmess
#17
loubnabnl
closed
2 years ago
0
add subtsring decontamination
#16
RaymondLi0
closed
2 years ago
0
github scraping speed limit
#15
bigximik
opened
2 years ago
0
Add decontamination code
#14
ChenghaoMou
closed
2 years ago
3
Decontamination
#13
ChenghaoMou
closed
2 years ago
9
Broken link
#12
Sleepyhead01
closed
2 years ago
1
Adding alternative minhash script
#11
ChenghaoMou
closed
2 years ago
15
[Near Deduplication] Tokenization
#10
ChenghaoMou
opened
2 years ago
2
[Near Deduplication] Post processing
#9
ChenghaoMou
opened
2 years ago
0
[Exact Substring Deduplication] Analysis
#8
ChenghaoMou
opened
2 years ago
1
[Near Deduplication] Benchmark
#7
ChenghaoMou
opened
2 years ago
2
Create CONTRIBUTING.md
#6
lvwerra
closed
2 years ago
0
Add filtering to the near deduplicated safe dataset
#5
loubnabnl
closed
2 years ago
1
Multi query experiments
#4
bigximik
closed
2 years ago
0
Reorganize bigcode-data-analysis repository
#3
loubnabnl
closed
2 years ago
1
Evaluate CodeGen on safe and all-license dataset
#23
harm-devries
closed
2 years ago
3
Rename model names on HF hub
#2
harm-devries
closed
2 years ago
1
Upload github dataset with license column
#1
harm-devries
closed
2 years ago
0