issues
search
bigcode-project
/
bigcode-analysis
Repository for analysis and experiments in the BigCode project.
Apache License 2.0
107
stars
20
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update README.md
#46
christiancopeland
closed
3 months ago
0
Update README.md
#45
christiancopeland
closed
5 months ago
0
Analysis notebook
#44
loubnabnl
opened
8 months ago
0
download dataset from kaggle
#43
xu3kev
opened
8 months ago
1
Pull Requests
#42
loubnabnl
opened
9 months ago
0
kaggle dataset
#41
loubnabnl
opened
9 months ago
0
Stackoverflow processing
#40
loubnabnl
opened
9 months ago
0
[WIP] textbooks filtering
#39
loubnabnl
opened
10 months ago
0
[WIP] code reviews dataset
#38
loubnabnl
opened
11 months ago
0
Add pdf of Miro board of MozFest
#37
harm-devries
closed
1 year ago
0
Chinchilla analysis
#36
harm-devries
closed
1 year ago
0
add scaling laws notebook
#35
lvwerra
closed
1 year ago
4
Data inspection
#34
harm-devries
closed
1 year ago
0
add github issues analysis notebook
#33
loubnabnl
closed
1 year ago
0
Add unimax exploration notebook
#32
harm-devries
closed
1 year ago
1
Issues language identifier
#31
Muhtasham
closed
1 year ago
0
Minhash Improvement
#30
ChenghaoMou
closed
1 year ago
1
add kenlm experiment
#29
lvwerra
closed
1 year ago
0
update readmes of filtering methods
#28
loubnabnl
closed
1 year ago
0
add code preprocessing and comment to code notebook
#27
loubnabnl
closed
1 year ago
0
Email regex modified
#26
paulovn
closed
1 year ago
0
add PII detection pipeline and analysis notebooks
#25
loubnabnl
closed
1 year ago
0
Use detect-secrets to scan secrets (WIP)
#24
liyongsea
closed
1 year ago
2
MQA experiments on AWS SageMaker Lab
#22
ocramz
closed
1 year ago
4
requirements uses the right branch of transformers
#21
ocramz
closed
1 year ago
0
cannot import AttentionType from gpt2
#20
ocramz
closed
1 year ago
0
[Decontamination] Add readme and instructions to run substring decontamination
#19
RaymondLi0
closed
1 year ago
1
update readme and requirements
#18
ChenghaoMou
closed
1 year ago
0
Reorganize data analysis folder and update readmess
#17
loubnabnl
closed
1 year ago
0
add subtsring decontamination
#16
RaymondLi0
closed
1 year ago
0
github scraping speed limit
#15
bigximik
opened
1 year ago
0
Add decontamination code
#14
ChenghaoMou
closed
1 year ago
3
Decontamination
#13
ChenghaoMou
closed
1 year ago
9
Broken link
#12
Sleepyhead01
closed
1 year ago
1
Adding alternative minhash script
#11
ChenghaoMou
closed
1 year ago
15
[Near Deduplication] Tokenization
#10
ChenghaoMou
opened
1 year ago
2
[Near Deduplication] Post processing
#9
ChenghaoMou
opened
1 year ago
0
[Exact Substring Deduplication] Analysis
#8
ChenghaoMou
opened
1 year ago
1
[Near Deduplication] Benchmark
#7
ChenghaoMou
opened
1 year ago
2
Create CONTRIBUTING.md
#6
lvwerra
closed
1 year ago
0
Add filtering to the near deduplicated safe dataset
#5
loubnabnl
closed
1 year ago
1
Multi query experiments
#4
bigximik
closed
1 year ago
0
Reorganize bigcode-data-analysis repository
#3
loubnabnl
closed
1 year ago
1
Evaluate CodeGen on safe and all-license dataset
#23
harm-devries
closed
1 year ago
3
Rename model names on HF hub
#2
harm-devries
closed
1 year ago
1
Upload github dataset with license column
#1
harm-devries
closed
1 year ago
0