commoncrawl Search Results

898 results
for commoncrawl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

trivio/common_crawl_index #29

project deprecated?

Is this project deprecated? I see there are no commits since 2013, and there appears to be a new index scheme available since 2015: http://commoncrawl.org/2015/04/announcing-the-common-crawl-index/ …

jric updated 4 years ago
1
CAIDA/commoncrawl-host-ip-mapper #3

Build errors v0.2.14

when trying to build the project with cargo build --release i'm getting this error. ![Screenshot 2024-08-04 122401](https://github.com/user-attachments/assets/234d532b-17b9-4c9b-9292-a19fd5975de4) …

Techrese updated 3 months ago
2
PetrochukM/PyTorch-NLP #61

Support loading fasttext model from custom file

What if I want to use own pretrained fasttext model (or even commoncrawl model instead of standard wiki one)? E.g. look what they publish now: https://fasttext.cc/docs/en/crawl-vectors.html. Current …

keanpantraw updated 3 years ago
5
dbmdz/berts #16

German BERT Dataset sampling

Hi, do you sampled each dataset (Wikipedia, Common Crawl, Subtitles etc.) equally during German-BERT Training? OpenAI uses a unequal sampling, which may lead to a better result, as stated in the G…

Phil1108 updated 4 years ago
2
projectdiscovery/subfinder #1418

[Issue] Problem with two APIs

**Describe the bug** * The GitHub API is very slow (see https://github.com/projectdiscovery/subfinder/discussions/1393) * Hunter's API gives me an error with the -v option: ``` [WRN] Could not run sou…

Bundy01 updated 3 weeks ago
7
facebookresearch/cc_net #35

403 forbidden while downloading

hi there, I encountered the 403 error while trying downloading ccnet data using this pipeline. Wondering if this is bcs of the network settings from my side or is there anything wrong? Thanks in ad…

Raven-Ren updated 2 years ago
2
meta-llama/llama #296

Paper questions: Common Crawl processing questions

There are a few details missing from the paper that are required to really understand what data was actually used for training LLAMA. The paper notes: > We preprocess five CommonCrawl dumps, ran…

joshalbrecht updated 1 year ago
1
huggingface/datatrove #302

Frequent S3 Slowdown Error

When processing CommonCrawl, I frequently get SlowDown Errors: `{'Error': {'Code': 'SlowDown', 'Message': 'Please reduce your request rate.'}`. Is this common? Are there any recommended strategies for…

theyorubayesian updated 1 week ago
2
wenhuchen/Semi-Supervised-Image-Captioning #1

TypeError

When I run the "python train.py --saveto commoncraw_pretrained --dataset commoncrawl --cutoff 15", the got the following error: Traceback (most recent call last): File "train.py", line 341, in …

getengqing updated 7 years ago
9
bigscience-workshop/data_tooling #253

Create dataset unsupervised_cross_lingual_representation_lea…

- uid: unsupervised_cross_lingual_representation_learning_at_scale - type: processed - description: - name: Unsupervised Cross-lingual Representation Learning at Scale - description: This pap…

albertvillanova updated 2 years ago
4

上一页 1...6 7 8 9 10 11 12...90 下一页

898 results for commoncrawl

898 results
for commoncrawl