-
### Problem Description
Crawler has the ability to store full pages as HTML, but often only subsets of HTML are useful. For example many sites have key content in xpath(*//main), and current tooling a…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Feature description
Some RSS feeds only include a small snippet of the article, or sometimes nothing at all. I…
-
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See er…
-
Excuse me. Here is my a piece of code:
```Python
extraction_strategy = LLMExtractionStrategy(
provider='ollama_chat/qwen2.5-coder',
url_base="http://localhost:11434",
…
-
When profiling `trafilatura.bare_extraction` method for some pages that took us a while to parse, I found that significant performance issues in `extract_content` method.
**Root cause**: Too many…
-
INFO: connection open
Received params
Using openAiApiKey from client-side settings dialog
Using openAiBaseURL from client-side settings dialog
Generating vue_tailwind code in image mode using …
-
Extract text from html tags in the raw data.
-
I was wondering whether there is a functionality to not wipe all the html in the extraction process, for example, for the 10-ks it would be nice to know what is for example tables, lists, headings etc…
-
**Description**
Errors are clear and present for the user where he/she can easily see them in order to fix it.
**Preconditions**
Stateful Web crawlers -> View Crawler -> Manage Domains page, Extract…
-
for reasons not important to this issue, i have my html template inside es6 .js files which export the templates as string
``` js
// template.js
export default `
Heading
Text in paragraph
`;
```
…