Open zarlicho opened 1 month ago
extract_from_chunk expects a single chunk object, not a list. Also, be sure to pass a dictionary in as the schema, not a string. To avoid this type of issue you can actually specify the chunking method inside the extraction function:
from thepipe.chunker import chunk_by_page
results = extract_from_file(
"example.pdf",
schema={"section_title": "string", "content": "string"},
chunking_method=chunk_by_page
)
I want to extract a chunk from json with the extract_from_chunk() function but I get an error like this
({'chunk_index': 4, 'source': 'pdf', 'error': "'list' object has no attribute 'to_message'"}, 0)
print(thepipe.extract.extract_from_chunk(chunk=chunk,chunk_index=4,schema="bill_name",ai_model='openai/gpt-40',source='pdf',multiple_extractions=True,extraction_prompt=prompting,host_images=True))