commoncrawl Search Results

870 results
for commoncrawl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

fhamborg/news-please #34

AttributeError: 'module' object has no attribute 'request'

Hello! After a fresh install I ran the example code from the readme file and it gave me the following error: ``` >>> from newsplease import NewsPlease >>> article = NewsPlease.from_url('https:…

gambolputty updated 6 years ago
5
tensorflow/tensor2tensor #10

How to run the Walkthrough example with other data than WMT?

Is it possible to run the Walkthrough example from the website with other data than WMT? I've tried changing the data paths in `wmt.py:` ``` _ENDE_TRAIN_DATASETS = [ [ "http://dat…

mehmedes updated 7 years ago
16
VIDA-NYU/ache #64

Support standard WARC file format

WARC is a standardized file format used for storing web crawl data. It's widely used for storing large scale web data collections such as CommonCrawl and ClueWeb12. WARC ISO 28500 draft is availabl…

aecio updated 7 years ago
3
fhamborg/news-please #36

HTTP Error 505: HTTP Version not supported

Hi, I try to getting started by running commoncrawl.py but encountered this error. I checked everything I know but still no luck. Do you happen to know what this issue is about? Attached is the erro…

Zombo1296 updated 7 years ago
1
fhamborg/news-please #26

Error running commoncrawl.py

After running commoncrawl.py for like 15min it throws following error: ``` DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): ads.civitasmedia.com DEBUG:urllib3.connectionpool:http:/…

IclickButtons updated 7 years ago
3
ScaleUnlimited/flink-crawler #37

Create an HttpFetcher that uses the common-crawl data

Typical URL search in CC index looks like: http://index.commoncrawl.org/CC-MAIN-2017-17-index?url=scaleunlimited.com%2F*&output=json You can add `&filter=status:200` as a filter, for example. …

kkrugler updated 7 years ago
5
tensorflow/tensor2tensor #23

Data download corrupted when running demo

When running the demo (also in README: English-to-German translation model using the Transformer model from Attention Is All You Need on WMT data.), downloading the data, gives a corrupted version. E…

Jordy-VL updated 7 years ago
4
JosephP91/curlcpp #111

Missing option CURLOPT_PIPEWAIT

Could you please add this option to library? When i define it myself in my code as `CURLCPP_DEFINE_OPTION(CURLOPT_PIPEWAIT, long);` i get error ``` error: expected constructor, destructor, or t…

ghost updated 7 years ago
4
tensorflow/tensor2tensor #102

Python 2 t2t-datagen fails on Unicode errors

$ python /usr/local/bin/t2t-datagen --data_dir=$DATA_DIR --tmp_dir=$TMP_DIR --num_shards=100 --problem=$PROBLEM INFO:tensorflow:Generating training data for wmt_ende_tokens_32k. INFO:tensorf…

dakami updated 7 years ago
2
piskvorky/gensim #1453

Data/Model storage

We want to store **trained models** and popular **dataset** (in raw/preprocessed format). Also, we want to develop a simple API for accessing this data. This project makes our users a bit happier. …

menshikh-iv updated 6 years ago
19

上一页 1...76 77 78 79 80 81 82...87 下一页

870 results for commoncrawl

870 results
for commoncrawl