commoncrawl Search Results

857 results
for commoncrawl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

esbatmop/MNBVC #40

数据分发有考虑过使用S3进行存储和提供下载吗

考虑提供和commoncrawl一样的下载方式吗

chinoll updated 8 months ago
1
dlab-trainings/social-data-carpentry-2015 #15

Courtlistener and commoncrawl as potential big open source r…

Re a comment by @davclark on email ("If folks have already-developed datasets that are amenable to a range of text processing, please let me know!"): - See https://www.courtlistener.com/, especially h…

rdhyee updated 9 years ago
11
facebookresearch/LASER #279

Can't download: 403 error on some CC segments.

2024-02-14 21:01 INFO 2048692:root - Downloaded https://dl.fbaipublicfiles.com/laser/CCMatrix/v1.0.0/2020-10_0278.tsv.gz [200] took 8s (5766.4kB/s) 2024-02-14 21:01 INFO 2048692:root - Starting downl…

enn-nafnlaus updated 5 months ago
1
facebookresearch/cc_net #53

503 Server Error: Service Unavailable for url

When I use `python -m cc_net ` to download and extract work, I am told that the connection cannot open `requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: https://data.comm…

yangyang0202 updated 1 year ago
1
microsoft/biosbias #4

Access denied in downloading script

Traceback (most recent call last): File "download_bios.py", line 255, in assert r.status_code == 200 AssertionError The error code seems to be 403

flodorner updated 2 months ago
2
commoncrawl/cc-index-server #4

[PyWB2] Error page and status

If an error occurs the index server responds with HTTP status code 200 OK, it should return a 503 or 5xx error. Seen with: - call of non-existing API endpoint (collection) return 200 + empty result …

sebastian-nagel updated 3 years ago
1
mozilla/common-voice-global-sprint #13

Make it very clear that CommonCrawl/OpenSubtitles contain co…

At the moment in the link to [contributing](https://voice-sprint.mozilla.community/contributing/) it suggests CommonCrawl and OpenSubtitles as good places to find text, while saying that Wikimedia sit…

ftyers updated 6 years ago
1
togethercomputer/RedPajama-Data #35

Expected finish time for processing one single index of comm…

One more question, please. using the provided command, how long does it take to finish the each step(e.g, quality filtering, deduplication, quality classifier) for processing single index of common…

kimcando updated 1 year ago
3
filplus-bookkeeping/DAYOU #12

[DataCap Application] Commoncraw

### Version 1 ### DataCap Applicant FileTech ### Project ID FileTech-02 ### Data Owner Name Commoncrawl ### Data Owner Country/Region United States ### Data Owner Industry Life Science / He…

nike-mp updated 2 days ago
24
commoncrawl/commoncrawl #5

VerifyError

I'm trying Common Crawl w/ Hadoop 0.20.205 and I'm getting the following: Exception in thread "main" java.lang.VerifyError: (class: org/commoncrawl/hadoop/io/JetS3tARCSource, method: configureImpl si…

gsingers updated 12 years ago
1

上一页 1...1 2 3 4 5 6 7...86 下一页

857 results for commoncrawl

857 results
for commoncrawl