-
It crawled some useless content of sites like site navmenu, header and footer, how to remove them from crawler or search api?
-
Hello,
I have a flowise workflow to web scrape our entire web (150+ pages) and then save it to Pinecone. We are currently using Cheerio Web scrapper node. (it could be Puppeteer, Playwright - it does…
-
I was using Zimit to archive the SCP-CN Wikidot site and encountered an interruption of the program due to a puppeteer error.
Attached here is the log output before the program exits.
```
{"timesta…
-
### ZIM(s) location
https://library.kiwix.org/#lang=&q=gcf
### Recipe(s) URL
https://farm.openzim.org/recipes?name=edu.gcfglobal.org
### Readers tested
- [ ] Kiwix-serve on iOS (iPad / iPhone)
- …
-
Currently it seems screenshot are made before custom behaviors.
It could be very interesting to be able a post-custom behaviors screenshot. For example to capture screenshot after removing the "acc…
-
**Is your idea related to a problem? Please describe.**
Since confidentiality is a business classification type, it should be in the datasets_base section. With new changes in v2.6, the custom confi…
-
I did a small tweak so we can have custom folder for local cache DB.
`
class RandomUserAgentMiddleware(object):
def __init__(self, crawler):
super(RandomUserAgentMiddleware, self).__…
-
Hello
i find the load of the list of proxies in from_crawler (middleware.py) : the load is in a constructor of object.
i read this in a good site of scraping : " ...write some code that would a…
-
### Current Behaviour
No Drysnap Crawler is spawned.
It never awards more than one shellfish.
### Expected Blizzlike Behaviour
Opening a Shellfish Trap should sometime spawn an aggressive Drysnap …
-
If you have a page with some different of elements on it
like
- some images with lightbox
- a video
- elements which can be unfolded
- etc.
The usage of these elements could be tracked via Custom Eve…