issues
search
bigscience-workshop
/
data_tooling
Tools for managing datasets for governance and training.
Apache License 2.0
77
stars
48
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update flagged_words.py
#369
sashavor
closed
2 years ago
2
Fix self-assign GH Action for multiple assignees
#368
albertvillanova
closed
2 years ago
0
[WIP] Lucile Work to create pseudo crawl corpus
#367
SaulLu
closed
2 years ago
0
Create dataset odiencorp2_0
#365
albertvillanova
closed
2 years ago
4
Create dataset opus_100
#364
albertvillanova
closed
2 years ago
2
Create dataset vanguard_daily_media
#363
albertvillanova
opened
2 years ago
0
Create dataset mind_body_green
#362
albertvillanova
opened
2 years ago
0
Create dataset human_instructions_in_indonesian_extracted_from_wikihow
#361
albertvillanova
opened
2 years ago
0
Create dataset malindomorph__morphological_dictionary_and_analyser_for_malay_indonesian
#360
albertvillanova
opened
2 years ago
0
Create dataset MT_Vi_Mono_VLSP2020
#359
albertvillanova
closed
2 years ago
4
Create dataset wikihow_vietnamese_human_instructions
#358
albertvillanova
opened
2 years ago
2
Create dataset du_reader
#357
albertvillanova
closed
2 years ago
4
Create dataset information_week_digital_magazine
#356
albertvillanova
opened
2 years ago
0
Create dataset nurition_fact
#355
albertvillanova
opened
2 years ago
0
Create dataset ekantipur_com
#354
albertvillanova
opened
2 years ago
0
Create dataset science_magazing_aaas_academic_journal_
#353
albertvillanova
closed
2 years ago
1
Create dataset tsac
#352
albertvillanova
opened
2 years ago
0
Create dataset indonesian_news_articles_2017
#351
albertvillanova
closed
2 years ago
4
Create dataset xnli
#350
albertvillanova
opened
2 years ago
0
Create dataset washington_post_wapo
#349
albertvillanova
opened
2 years ago
0
Create dataset webmd_health_and_wellbeign
#348
albertvillanova
closed
2 years ago
1
Create dataset offenseval_dravidian
#347
albertvillanova
opened
2 years ago
0
Create dataset the_hill_newspaper_and_digital_media
#346
albertvillanova
opened
2 years ago
0
Create dataset lihkg
#345
albertvillanova
opened
2 years ago
0
Create dataset apple_insider_blog
#344
albertvillanova
opened
2 years ago
0
Create dataset webmd_health_and_wellbeing
#343
albertvillanova
opened
2 years ago
0
Create dataset the_new_york_times
#342
albertvillanova
opened
2 years ago
0
Create dataset everyday_health_group_digital_media
#340
albertvillanova
opened
2 years ago
0
Create dataset boy_genius_report_bgr
#338
albertvillanova
opened
2 years ago
0
Create dataset stack_exchange_website
#337
albertvillanova
opened
2 years ago
0
Create dataset detik
#336
albertvillanova
opened
2 years ago
0
Create dataset science_magazing_aaas_academic_journal
#335
albertvillanova
opened
2 years ago
0
Create dataset freelancer_market_place_website
#334
albertvillanova
opened
2 years ago
0
Create dataset bleacher_report_sport_blog
#333
albertvillanova
opened
2 years ago
0
Create dataset openiti
#332
albertvillanova
closed
2 years ago
2
Create dataset the_athletic_sport_coverage
#331
albertvillanova
opened
2 years ago
0
Create dataset reuters_news_organisation
#330
albertvillanova
opened
2 years ago
0
Create dataset geekwire_technology_news
#329
albertvillanova
opened
2 years ago
0
Create dataset the_verge_vox_media
#327
albertvillanova
opened
2 years ago
0
Create dataset engadget_technology_blog
#326
albertvillanova
opened
2 years ago
0
Create dataset forbes_whale_media
#325
albertvillanova
opened
2 years ago
0
Create dataset ratopati_com
#324
albertvillanova
opened
2 years ago
0
Create dataset french_treebank
#323
albertvillanova
opened
2 years ago
0
Create dataset bt_sport_uk_and_ireland
#322
albertvillanova
opened
2 years ago
0
[pre-commit.ci] pre-commit autoupdate
#321
pre-commit-ci[bot]
closed
2 years ago
0
pii-manager v. 0.5.0
#320
paulovn
closed
2 years ago
0
add streamlit app url to visualizer
#319
huu4ontocord
closed
2 years ago
1
[pre-commit.ci] pre-commit autoupdate
#318
pre-commit-ci[bot]
closed
2 years ago
0
Pseudo-Crawl curated list of sites: Data Sourcing Candidate seeds spreadsheet
#317
sebastian-nagel
closed
2 years ago
0
Remove debug executor from cc_net to fix the pre-commit failure
#316
edugp
closed
2 years ago
0
Previous
Next