-
This ticket tracks curation workflow progression.
Note: It is possible for work to take place simultaneously in the three sections with overlapping periods, allowing curation workflows for different…
-
Is it possible to get the original content of all feeds and have the possibility of an opt-out instead? Just switched from coldsweat and changing each of my 300 feeds manually is going to be a pain
-
**bug Description**
The issue is if we input any link (eg. www.google.com) the summariser thinks it's an article link and summarises it.
**To Reproduce**
Steps to reproduce the behavior:
1. Go t…
-
안녕하세요, 편리한 패키지 공유해주셔서 감사합니다.
colab 환경에서 `readme` 안내대로 실행해보았습니다.
`pip` 로 정상적으로 설치하고, 패키지 정상 로드까지 확인했습니다.
이후 아래 코드로 크롤링한 후, 각 파일을 `pandas`로 열어보면 모두 `EmptyDataError`가 납니다.
파일을 직접 다운로드 받아 엑셀로 열어보면 모…
-
I was analyzing the exit file and I realized the text for each "news" is only the title, the headline, and the 1st paragraph. It must be correct?
I'm using the crawler for "pt" language.
ghost updated
5 years ago
-
On the internal source list, we have other possibly interesting sources that would be interesting to show in our NLP pipeline, e.g. UK-gov sites, news sources etc…
Like the Twitter API, it would …
-
Identifier validation failed for the dataset [Insekten Sachsen](https://registry.gbif.org/dataset/77ecd330-b09e-11e2-a01d-00145eb45e9a):
- Crawler attempt: 318
- Publishing organization: [Senckenberg]…
-
I have this code:
```
require ('vendor/autoload.php');
use Goutte\Client;
use GuzzleHttp\Client as GuzzleClient;
$goutteClient = new Client();
$guzzleClient = new GuzzleClient(array(
…
-
I'm seeing some strange issues with `alternet.org`. If I go directly to `https://www.alternet.org/category/news-politics/page/2/` I get a `403 forbidden` error (both in the crawler and interactively) …
-
**Issue by [durakkerem](https://github.com/durakkerem)**
_Tue May 8 20:34:27 2018_
_Originally opened as https://github.com/codelucas/newspaper/issues/563_
----
So I know that I can building a new…