document-content-extraction Search Results

1000+ results
for document-content-extraction

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

stanfordnlp/dspy #1075

Error with Custom Local Model

Hello, I am facing an issue with dspy using a Custom LM. The LM is Mistral Instruct v0 2 7B deployed using a local inference server on LM Studio. According to LM Studio this is how the model is cal…

xkpacx updated 1 month ago
5
ARK-Builders/ARK-Navigator #178

Text search

It should be possible to filter documents by presence of query word in their text content. For this, it's necessary to implement: 1. Text-layer extraction from PDF and other text-based documents. …

kirillt updated 2 years ago
2
langchain-ai/langchain #22256

MarkdownHeaderTextSplitter flattens Paragraphs separators in…

### Checked other resources - [X] I added a very descriptive title to this issue. - [X] I searched the LangChain documentation with the integrated search. - [X] I used the GitHub search to find a sim…

relston updated 1 month ago
1
gtech-mulearn/Top100-OpenAi-Challenge #9

AskPDF - Ask anything from your PDF

## Project Goal ChatPDF is an AI-powered tool designed to revolutionize document analysis by combining contextual understanding, multilanguage support, data visualization, and smart recommendation fe…

Ajishabraham1993 updated 8 months ago
1
emcf/thepipe #24

Full-page screenshot when extracting page URL

I'm running thepipe locally to extract some page URLs for processing with GPT4o, and it seems that the image generated for each page only captures the content above the fold (See example below). Is th…

michael-supreme updated 3 weeks ago
4
OpenTermsArchive/sandbox-declarations #304

`Indiatimes` ‧ `Community Guidelines` ‧ not tracked anymore

### No version of the `Community Guidelines` of service `Indiatimes` is recorded anymore since 2 May 2024 at 9:08:00 UTC The source documents have been recorded in snapshots, but no version can be …

OTA-Bot updated 2 months ago
4
langchain-ai/langchain #24225

[Google Generative AI] Structured Output doesn't work with a…

### Checked other resources - [X] I added a very descriptive title to this issue. - [X] I searched the LangChain documentation with the integrated search. - [X] I used the GitHub search to find a…

ToyHugs updated 11 hours ago
1
kermitt2/grobid #313

Word document instead of PDF

Hi, At present, I have all documents as DOCX (Microsoft Word files) which I convert to PDF in order to run the GROBID XML conversion. Is there any possibility of using DOCX as input? In case of …

sarankup updated 4 years ago
5
llm-tools/embedJs #54

add HTML page Loader

not the same as add WebLoader, but LangChain has this hook to pass in a HTML page content, and some other settings. Under certain conditions this could be very userful to customize the HTML content or…

converseKarl updated 1 month ago
4
adbar/trafilatura #173

Extract inline structured data from page <body>

It seems that trafilatura attempts to parse [schema.org](https://schema.org)-compliant structured data [such as microdata](https://github.com/adbar/trafilatura/blob/6a7b58174add10f686e230d5213203ff85b…

Seirdy updated 2 years ago
4

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for document-content-extraction

1000+ results
for document-content-extraction