-
1. The crawling is often incomplete -- stories at later of the webpage will likely being ignored.
Consider segmenting (chunking) text snap shot before passing to GPT.
- decide which chunk size w…
-
There's been multiple failures in last few days[^1].
The crawler is failing while processing data:
```
File: ieee-rawbib/updates.20230831/IEEEUpdates_IEEEstd/week27b/4130.zip
undefined method `tex…
-
### Observed Results:
### Expected behavior:
-
### Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
### Branch name
main
### Commit ID
Commit ID
### Other environment information
_No response_
### Actu…
-
exhentai 下載時出現下列錯誤
```
Start downloading Keijo!!!!!!!! (1038794/4f48efcd59)
total 1 episode.
Downloading ep image
Traceback (most recent call last):
:1: SyntaxWarning: invalid escape sequence '…
-
今天下載了最新版的ComicCrawler
但下載8comic漫畫還是有錯
漫畫網址:
https://www.8comic.com/html/13736.html
錯誤:
Traceback (most recent call last):
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\s…
-
Got a site failing with this error:
```
Adhoc task failed: tool_crawler\task\adhoc_crawl_task,rawurlencode(): Argument #1 ($string) must be of type string, array given
Backtrace:
line 522 of /li…
-
**Describe**
A clear and concise description of what the bug is.
After migrating to 1.11.28
When I try to download a certificate in PDF i Get an error 500
generating html certificate is OK
…
-
I'm planning to add a smart crawler that takes a set of user-defined objectives and continues crawling to satisfy them. Objectives can be a query requiring a sufficient amount of information to answer…
-
Implement some way to stop crawler in obvious and controlled way from the user function. It should properly shutdown all resources and immediately stop crawler to send any requests. It should be mirro…