-
Hello!
After a fresh install I ran the example code from the readme file and it gave me the following error:
```
>>> from newsplease import NewsPlease
>>> article = NewsPlease.from_url('https:…
-
Is it possible to run the Walkthrough example from the website with other data than WMT?
I've tried changing the data paths in `wmt.py:`
```
_ENDE_TRAIN_DATASETS = [
[
"http://dat…
-
WARC is a standardized file format used for storing web crawl data. It's widely used for storing large scale web data collections such as CommonCrawl and ClueWeb12.
WARC ISO 28500 draft is availabl…
aecio updated
7 years ago
-
Hi,
I try to getting started by running commoncrawl.py but encountered this error. I checked everything I know but still no luck. Do you happen to know what this issue is about? Attached is the erro…
-
After running commoncrawl.py for like 15min it throws following error:
```
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): ads.civitasmedia.com
DEBUG:urllib3.connectionpool:http:/…
-
Typical URL search in CC index looks like:
http://index.commoncrawl.org/CC-MAIN-2017-17-index?url=scaleunlimited.com%2F*&output=json
You can add `&filter=status:200` as a filter, for example.
…
-
When running the demo (also in README: English-to-German translation model using the Transformer model from Attention Is All You Need on WMT data.), downloading the data, gives a corrupted version.
E…
-
Could you please add this option to library?
When i define it myself in my code as
`CURLCPP_DEFINE_OPTION(CURLOPT_PIPEWAIT, long);`
i get error
```
error: expected constructor, destructor, or t…
ghost updated
7 years ago
-
$ python /usr/local/bin/t2t-datagen --data_dir=$DATA_DIR --tmp_dir=$TMP_DIR --num_shards=100 --problem=$PROBLEM
INFO:tensorflow:Generating training data for wmt_ende_tokens_32k.
INFO:tensorf…
-
We want to store **trained models** and popular **dataset** (in raw/preprocessed format). Also, we want to develop a simple API for accessing this data.
This project makes our users a bit happier.
…