-
Hi there,
Per my Reddit comment at http://uk.reddit.com/r/datasets/comments/26xqgs/downloading_all_of_hacker_news_posts_and_comments/ , there are 641k IDs that don't appear anywhere.
It looks like e…
-
Currently, in the event of failures, especially in the 'reliability' test suite, test titles can become garbled, and the number of executed tests can change seemingly arbitrarily.
It's clear to me th…
-
Hi Pascal,
I am working on a website which include different domains, such as...
```
// Below are the domains in the start url section
www.rthk.hk
app3.rthk.hk
app4.rthk.hk
programme.rthk.hk
…
-
## Description
We are using Algolia Crawler UI for parsing our mixed static HTML & SPA website (using hash router). All URLs are provided in `sitemaps` Crawler config.
```js
new Crawler({
st…
-
http://www.mingpao.com/
-
Hi!
I was very glad to find your extension during the process of updating a 4.2 to 8.7! Thank you so much for making it public!
Would it be possible to add simple configurations to the documentatio…
-
Hi there, @unclecode !
I noticed that the library has been updated to 0.3.73, 'Parallel Power: Supercharged multi-URL crawling performance', what are the specific updates in 'multi-URL crawling'? …
-
Hello, I have some problem with sandcrawler Phantom Spider.
I tried to use this code:
```
var sandcrawler = require('sand crawler')
var spider = sandcrawler.phantomSpider()
.url('https://…
-
# Todo
- [x] Geral
- [ ] Criar CLI
- [ ] Tornar mais fácil a configuração para pessoas que iriam utilizar
- [ ] Parte gráfica (web)
- [x] Decidir o limite de noticia que serão p…
-
Identifier validation failed for the dataset [Carnegie Museum of Natural History - Mollusks](https://registry.gbif.org/dataset/07ae2aa8-5031-4312-b26e-84a5c753daac):
- Crawler attempt: 73
- Publishing…