-
Could you please document your process to extract the Lua binary chunks? Once you identify the header, how do you know how big they are?
Thank you!
-
Hi folks! Love Verba, does the project support or plan to support pluggable retrievers? We are building an open-source reliable extraction and embedding engine - https://getindexify.ai We are pan on s…
-
**Why it’s Important:**
The goal is to define a comprehensive list of topics that can be used across all resources in our infrastructure for consistent topic extraction. By creating a centralized, …
-
Text extraction from the pdf's is not always 100% accurate because the gazette documents always have 2 columns of text and when they're too close to eachother sentences or words can be mixed up with t…
-
- Dividing a document into sections
- Example: beginning 20% + content 60% + ending 20%
- Count by sentence/word
- Assigning feature values
- Accumulated pattern score
- above threshold --> …
-
### Reference Issues
_No response_
### Summary
docling supports automatic parsing of pdf's with tables. I've found it very beneficial.
https://github.com/DS4SD/docling/issues
### Basic Example
…
-
Can page extraction from large documents be speeded up? Currently it takes about 10 seconds to extract a page from a several thousand pages long document, for me at least.
Here's what I do:
```p…
-
**What Article or Section of the Constitution should this apply to?**
Appendix I, Cardano Blockchain Guardrails
**Describe the reason for your proposed extraction**
We, as a community, should ins…
-
## Claim Extraction in Solar Energy News Articles
### Team
- 202318013 - Vrishmi Parikh
- 202318030 - Mahmood Topiwala
- 202318039 - Anurag Shukla
- 202318056 - Tanaz Pathan
### Categor…
-
I didn't remember how to do multi channel extraction and realized that none of our guides touch on it.