crawler-engine Search Results

KosmosisDire/obsidian-webpage-export #467

XML Sitemap for Search Engine crawler and SEO

Feature Request: Adding XML Sitemap for notes I noticed that google is quite reluctant to crawl my KB compared to blog posts, Sitemap may help with it. XML sitemaps also help with lowering number of …

TohidN updated 4 months ago

JayBizzle/Crawler-Detect #540

AI bots

I wonder whether this lib could/would add support to detect AI bots, so crawlers which are used to feed AI engines. I found a repo which lists most of them: https://github.com/ai-robots-txt/ai.robots…

staabm updated 2 weeks ago

aaron-schroeder/scrapy-athlinks #2

scrapy-athlinks is broken

Hello, API on related tools has changed quite a bit over 2 years and the project doesn't work out of the box anymore (related to the fact the versions of related tools were not pinned on the requir…

josevnz updated 3 weeks ago

nehanims/notes #52

AWS Glue open source alternatives

[AWS Glue](https://aws.amazon.com/glue/features/) seems really useful especially it's fuzzy FindMatches feature, ([although LLM based cosine similarity embeddings should provide similar features](http…

nehanims updated 1 month ago

zhangbc97/tushare-integration #9

clickhouse_connect.driver.exceptions.ProgrammingError: Unrec…

按文档往 ClickHouse 里拉数据，然后先执行了 `python main.py run job stock/basic` 再开始执行 `python main.py run job stock/quotes` 时，开始报错： ```sh 2024-11-07 13:30:59 [scrapy.crawler] INFO: Received SIGINT, shutting d…

uiosun updated 3 hours ago

Open-Code-Crafters/FitFlex #265

SEO Optimization of the FitFlex

![image](https://github.com/user-attachments/assets/fd3ba406-aa79-44e0-aee3-03230e68d5c2) The SEO Score of the website is literally poor. I want to enhance the SEO score.

Shariq2003 updated 3 weeks ago

unclecode/crawl4ai #122

cannot import name 'WebCrawler' from 'crawl4ai'

Hi, when I try to run crawl4ai with microsoft edge on windows, I have this error below, ( same code works for ubuntu on chrome) Traceback (most recent call last): File "d:\work\indexing\scrapper.…

gulnihalk updated 1 month ago

scrapy/scrapy #4292

Exceptions in middleware don't return exit code 1 in `scrapy…

### Description If a middleware raises an exception, running `scrapy crawl` or `scrapy check` raises the exception to the shell but returns with exit code 0, instead of the expected 1. ### Steps…

dpfeif updated 3 months ago

scrapy/scrapy #1523

engine_started vs spider_opened

I think we should explain what is the difference between `engine_started` and `spider_opened` signals better. There is now one engine per spider, so these signals look very similar. To make things wor…

kmike updated 2 months ago

Ddosser/wooyunspider #3

爬虫运行出错

Traceback (most recent call last): File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks result = g.send(result) File "/usr/local/lib/python2.…

lylsq updated 7 years ago

1000+ results for crawler-engine

1000+ results
for crawler-engine