Open istranic opened 2 weeks ago
Hey there, @istranic! I'm a bot here to help you with bug fixes, answer questions, and guide you on becoming a contributor. Let's tackle this issue together while we wait for a human maintainer.
The tool has specific configuration options for processing scientific or technical documents, which can help in handling scientific spec sheets effectively. These include:
Parsing Configuration:
use_doc_details
: Set to true
to utilize document details during parsing.chunk_size
: Set to 7000
to define the size of each chunk of the document.overlap
: Set to 250
to specify the overlap between chunks.Answer Configuration:
evidence_k
: Set to 8
to determine the number of evidence pieces to consider.answer_max_sources
: Set to 3
to limit the maximum number of sources for an answer.max_concurrent_requests
: Set to 8
to control the number of concurrent requests.Prompts Configuration:
use_json
: Set to true
to enable JSON format for prompts.LLM Configuration:
gpt-4o
, gpt-4-turbo
, and gpt-3.5-turbo
, with limits ranging from 40,000
to 2,000,000
per minute.Embedding Configuration:
rate_limit
: Set to 1,000,000
per minute.These configurations are designed to manage resources and performance effectively, which should help in processing scientific spec sheets. However, there is no specific information on limitations or issues with highly technical or domain-specific language in the repository [1][2].
To continue talking to Dosu, mention @dosu.
Has anyone tried this tool with scientific spec sheets like those provided for manufacturing tools, probes, measurement devices, etc.
Is there anything in the way the models are prompted or the way the data is processed that would make that not work well?