issues
search
NVIDIA
/
NeMo-Curator
Scalable toolkit for data curation
Apache License 2.0
329
stars
32
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[BUG] FastTextLangId doesnt return a list truely
#33
zahramahani
closed
2 months ago
4
Add dataset blending tool
#32
ryantwolf
closed
1 month ago
2
[BUG] Serialization Issue with SLURM
#31
ryantwolf
closed
1 month ago
0
Add jupyter notebook tutorial for single node mulilingual dataset
#30
nicoleeeluo
closed
1 month ago
6
Adding jupyter notebook tutorial for single node multilingual dataset
#29
nicoleeeluo
closed
2 months ago
0
[FEA] Add support for huggingface datasets
#28
ayushdg
opened
2 months ago
1
Make GPU dependencies optional
#27
ayushdg
closed
2 months ago
3
[BUG] ImportError: lxml.html.clean module is now a separate project lxml_html_clean.
#26
chenrui17
closed
2 months ago
1
[FEA] Adding Dockerfile to build a nemo-curator container
#25
miguelusque
opened
2 months ago
0
Add dependency to fix justext
#24
ryantwolf
closed
2 months ago
2
Add lazy loading of imports
#23
ryantwolf
closed
1 month ago
1
Add issue templates
#22
ayushdg
closed
2 months ago
0
Test Public Issue
#21
elliottnv
closed
2 months ago
0
Fix Noisy CUDA Shutdown
#20
ryantwolf
closed
3 months ago
0
Unnable to run exact_deduplication script
#19
Manel-Hik
closed
1 month ago
3
Add batched decorator
#18
ryantwolf
closed
2 months ago
0
Bump Python Version
#17
ryantwolf
closed
2 months ago
0
Bump Python and RAPIDS versions
#16
ryantwolf
closed
2 months ago
1
Add citation
#15
ryantwolf
closed
3 months ago
0
Add pre-commit style checks
#14
ayushdg
closed
3 months ago
0
Add workflow for running cpu pytests
#13
ayushdg
closed
3 months ago
3
Remove argparse from get_client function signature
#12
ryantwolf
closed
1 month ago
1
Add a higher level fuzzy deduplication module
#11
ayushdg
closed
1 month ago
2
Add citation
#10
ryantwolf
closed
3 months ago
0
Fix noisy Dask shutdown
#9
ryantwolf
opened
3 months ago
3
Fix noisy CUDA shutdown
#8
ryantwolf
closed
3 months ago
0
Add batched decorator
#7
ryantwolf
closed
3 months ago
0
Update README
#6
ryantwolf
closed
3 months ago
0
[Tutorials] Add a readme file for the TinyStories tutorial
#5
Maghoumi
closed
3 months ago
0
Make NeMo-Curator installable in non GPU environments
#4
ayushdg
closed
2 months ago
1
Add style check
#3
ayushdg
closed
3 months ago
7
Create style.yml
#2
ayushdg
closed
3 months ago
0
GitHub Actions: Add style check
#1
ayushdg
closed
3 months ago
1
Previous