-
```python
from crawl4ai import WebCrawler
from crawl4ai.chunking_strategy import SlidingWindowChunking
from crawl4ai.extraction_strategy import LLMExtractionStrategy
crawler = WebCrawler()
…
-
**Describe the bug**
Sometimes when using chunking, the `text_as_html` for Table elements is missing some of the content compared to `text` property.
Reasoning:
- Text for a table can only come fro…
-
### What is the problem?
There is an image in the memory management section that has no background (transparent) and the text is black so can't be seen in dark mode: https://carpentries.github.io/ins…
-
This important, but difficult to explain - ask me if I am not clear.
Two 'granularities' are relevant when chunking the ENB reports
1. The first and coarser level is that of 'sections' corresponding…
tommv updated
8 years ago
-
## Describe the bug
Looks like `target_chunk_size` config and actual size of the chunk might be different. I have following dataset
```yaml
- from: github:github.com/apache/datafusion/files/main
…
-
**Is your feature request related to a problem? Please describe.**
Currently the `DocumentSplitter` in Haystack is relatively basic and recently we have seen that semantic splitting has greatly gaine…
-
I noticed that sometimes, when `combineChunks` is set to `true`, some chunks contain just a few characters/tokens.
Would it be an idea to add `minTokenSize` and perhaps anything under that, to be add…
-
### Describe your problem
Is it possible to use the different chunking methods on already parsed files in one of my local directories? Similar to being able to use the parsing of documents, I would l…
-
Now that a vision model can be specified in the settings and Archyve can ingest a jpg document into a single chunk, I think that I need some guidance on what to do with it next.
@oxaroky02 said I s…
-
### What do you need?
It would be great if fabric could automatically handle splitting/chunking for text that is too large for a given model.
From what I understand, this would need:
- Informatio…