Performance with Scientific Spec Sheets

Hey there, @istranic! I'm a bot here to help you with bug fixes, answer questions, and guide you on becoming a contributor. Let's tackle this issue together while we wait for a human maintainer.

The tool has specific configuration options for processing scientific or technical documents, which can help in handling scientific spec sheets effectively. These include:

Parsing Configuration:
- use_doc_details: Set to true to utilize document details during parsing.
- chunk_size: Set to 7000 to define the size of each chunk of the document.
- overlap: Set to 250 to specify the overlap between chunks.
Answer Configuration:
- evidence_k: Set to 8 to determine the number of evidence pieces to consider.
- answer_max_sources: Set to 3 to limit the maximum number of sources for an answer.
- max_concurrent_requests: Set to 8 to control the number of concurrent requests.
Prompts Configuration:
- use_json: Set to true to enable JSON format for prompts.
LLM Configuration:
- Various rate limits are specified for different models, such as gpt-4o, gpt-4-turbo, and gpt-3.5-turbo, with limits ranging from 40,000 to 2,000,000 per minute.
Embedding Configuration:
- rate_limit: Set to 1,000,000 per minute.

These configurations are designed to manage resources and performance effectively, which should help in processing scientific spec sheets. However, there is no specific information on limitations or issues with highly technical or domain-specific language in the repository ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Future-House / paper-qa

Performance with Scientific Spec Sheets #656