issues
search
LAION-AI
/
Big-Interleaved-Dataset
Big-Interleaved-Dataset
Apache License 2.0
58
stars
8
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Is this project still on-going?
#21
YANG-H
opened
1 year ago
0
Hi, is this dataset still working in progress?
#20
vtddggg
closed
1 year ago
0
add how to run in readme
#19
rom1504
opened
1 year ago
0
Packaging: setup.py
#18
harry-stark
closed
1 year ago
1
Blocking issue:
#17
harry-stark
opened
1 year ago
0
Optimize script
#16
siddheshmhatre
closed
1 year ago
0
Feature: Unified api for HTTP vs S3 connection
#15
harry-stark
opened
1 year ago
2
CC-Table integration
#14
harry-stark
opened
1 year ago
0
Small scale example to run in colab
#13
harry-stark
opened
1 year ago
0
End to End example
#12
harry-stark
opened
1 year ago
0
Feature tracker: HTML parsers
#11
harry-stark
closed
1 year ago
0
Logging support for each warc
#9
harry-stark
opened
2 years ago
0
Multi Node support
#8
harry-stark
closed
1 year ago
2
Add different CC mime types support
#7
harry-stark
opened
2 years ago
1
Integration in fast pyspark pipeline
#6
rom1504
opened
2 years ago
1
Quality of extractor
#5
rom1504
opened
2 years ago
0
BILD Phase 3
#4
harry-stark
opened
2 years ago
0
BILD Phase 2
#3
harry-stark
opened
2 years ago
0
BILD: Phase 1
#2
harry-stark
opened
2 years ago
0
BILD Tracking
#1
harry-stark
opened
2 years ago
0