Closed hitsense closed 1 year ago
Hello @hitsense :wave:. Thanks for taking the library for a spin!
kor
doesn't have a variable called line_terminator
anywhere in its codebase, so it's either a bug in langchain
or a typo in the code that you're using. Try bumping langchain, if that doesn't help look for any place in the code you're using that sets line_terminator
as that would be the source of the error. Good luck!
Hi @eyurtsev, I get this error for your article as well. I am not using anything else that uses line_terminator. Can you try running your document extraction article and check if you get the same error? I am using the latest version of langchain, and if you are using a specific version of langchain, then let me know.
I can confirm that this runs fine for me using newest langchain.
Could you include your stack trace for the exceptions?
You could try to run the code with return_exceptions = False
so the exception is raised instead of returned.
document_extraction_results = await extract_from_documents(
chain, split_docs, max_concurrency=5, use_uid=False, return_exceptions=False
)
Something wrong with encoder and pandas. Here is the trace back -
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[<ipython-input-75-05f5682b3d08>](https://localhost:8080/#) in <cell line: 1>()
1 with get_openai_callback() as cb:
----> 2 document_extraction_results = await extract_from_documents(
3 chain, txtt, max_concurrency=5, use_uid=False, return_exceptions=False
4 )
5
17 frames
[/usr/local/lib/python3.9/dist-packages/kor/extraction/api.py](https://localhost:8080/#) in extract_from_documents(chain, documents, max_concurrency, use_uid, extraction_uid_function, return_exceptions)
170 )
171
--> 172 results = await asyncio.gather(*tasks, return_exceptions=return_exceptions)
173 return results
[/usr/local/lib/python3.9/dist-packages/kor/extraction/api.py](https://localhost:8080/#) in _extract_from_document_with_semaphore(semaphore, chain, document, uid, source_uid)
26 async with semaphore:
27 extraction_result: Extraction = cast(
---> 28 Extraction, await chain.apredict_and_parse(text=document.page_content)
29 )
30 return {
[/usr/local/lib/python3.9/dist-packages/langchain/chains/llm.py](https://localhost:8080/#) in apredict_and_parse(self, **kwargs)
179 ) -> Union[str, List[str], Dict[str, str]]:
180 """Call apredict and then parse the results."""
--> 181 result = await self.apredict(**kwargs)
182 if self.prompt.output_parser is not None:
183 return self.prompt.output_parser.parse(result)
[/usr/local/lib/python3.9/dist-packages/langchain/chains/llm.py](https://localhost:8080/#) in apredict(self, **kwargs)
165 completion = llm.predict(adjective="funny")
166 """
--> 167 return (await self.acall(kwargs))[self.output_key]
168
169 def predict_and_parse(self, **kwargs: Any) -> Union[str, List[str], Dict[str, str]]:
[/usr/local/lib/python3.9/dist-packages/langchain/chains/base.py](https://localhost:8080/#) in acall(self, inputs, return_only_outputs)
152 else:
153 self.callback_manager.on_chain_error(e, verbose=self.verbose)
--> 154 raise e
155 if self.callback_manager.is_async:
156 await self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
[/usr/local/lib/python3.9/dist-packages/langchain/chains/base.py](https://localhost:8080/#) in acall(self, inputs, return_only_outputs)
146 )
147 try:
--> 148 outputs = await self._acall(inputs)
149 except (KeyboardInterrupt, Exception) as e:
150 if self.callback_manager.is_async:
[/usr/local/lib/python3.9/dist-packages/langchain/chains/llm.py](https://localhost:8080/#) in _acall(self, inputs)
133
134 async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, str]:
--> 135 return (await self.aapply([inputs]))[0]
136
137 def predict(self, **kwargs: Any) -> str:
[/usr/local/lib/python3.9/dist-packages/langchain/chains/llm.py](https://localhost:8080/#) in aapply(self, input_list)
121 async def aapply(self, input_list: List[Dict[str, Any]]) -> List[Dict[str, str]]:
122 """Utilize the LLM generate method for speed gains."""
--> 123 response = await self.agenerate(input_list)
124 return self.create_outputs(response)
125
[/usr/local/lib/python3.9/dist-packages/langchain/chains/llm.py](https://localhost:8080/#) in agenerate(self, input_list)
64 async def agenerate(self, input_list: List[Dict[str, Any]]) -> LLMResult:
65 """Generate LLM result from inputs."""
---> 66 prompts, stop = await self.aprep_prompts(input_list)
67 return await self.llm.agenerate_prompt(prompts, stop)
68
[/usr/local/lib/python3.9/dist-packages/langchain/chains/llm.py](https://localhost:8080/#) in aprep_prompts(self, input_list)
98 for inputs in input_list:
99 selected_inputs = {k: inputs[k] for k in self.prompt.input_variables}
--> 100 prompt = self.prompt.format_prompt(**selected_inputs)
101 _colored_text = get_colored_text(prompt.to_string(), "green")
102 _text = "Prompt after formatting:\n" + _colored_text
[/usr/local/lib/python3.9/dist-packages/kor/prompts.py](https://localhost:8080/#) in format_prompt(self, text)
80 text = format_text(text, input_formatter=self.input_formatter)
81 return ExtractionPromptValue(
---> 82 string=self.to_string(text), messages=self.to_messages(text)
83 )
84
[/usr/local/lib/python3.9/dist-packages/kor/prompts.py](https://localhost:8080/#) in to_string(self, text)
95 """Format the template to a string."""
96 instruction_segment = self.format_instruction_segment(self.node)
---> 97 encoded_examples = self.generate_encoded_examples(self.node)
98 formatted_examples: List[str] = []
99
[/usr/local/lib/python3.9/dist-packages/kor/prompts.py](https://localhost:8080/#) in generate_encoded_examples(self, node)
131 """Generate encoded examples."""
132 examples = generate_examples(node)
--> 133 return encode_examples(
134 examples, self.encoder, input_formatter=self.input_formatter
135 )
[/usr/local/lib/python3.9/dist-packages/kor/encoders/encode.py](https://localhost:8080/#) in encode_examples(examples, encoder, input_formatter)
57 """Encode the output using the given encoder."""
58
---> 59 return [
60 (
61 format_text(input_example, input_formatter=input_formatter),
[/usr/local/lib/python3.9/dist-packages/kor/encoders/encode.py](https://localhost:8080/#) in <listcomp>(.0)
60 (
61 format_text(input_example, input_formatter=input_formatter),
---> 62 encoder.encode(output_example),
63 )
64 for input_example, output_example in examples
[/usr/local/lib/python3.9/dist-packages/kor/encoders/csv_data.py](https://localhost:8080/#) in encode(self, data)
75 # Should always output records for pd.Dataframe
76 data_to_output = [data_to_output]
---> 77 table_content = pd.DataFrame(data_to_output, columns=field_names).to_csv(
78 index=False, sep=DELIMITER
79 )
[/usr/local/lib/python3.9/dist-packages/pandas/core/generic.py](https://localhost:8080/#) in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, decimal, errors, storage_options)
3549 header: bool_t | list[str] = True,
3550 index: bool_t = True,
-> 3551 index_label: IndexLabel | None = None,
3552 mode: str = "w",
3553 encoding: str | None = None,
[/usr/local/lib/python3.9/dist-packages/pandas/io/formats/format.py](https://localhost:8080/#) in to_csv(self, path_or_buf, encoding, sep, columns, index_label, mode, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, errors, storage_options)
1159 """
1160 Render dataframe as comma-separated file.
-> 1161 """
1162 from pandas.io.formats.csvs import CSVFormatter
1163
TypeError: __init__() got an unexpected keyword argument 'line_terminator'
Definitely pandas associated. I successfully ran the code with pandas 1.5.3 and with pandas 2.0.0. Which version are you using?
This also looks like something that's internal to pandas since kor
isn't specifying a line_terminator
named argument and instead is using sep
.
One thing I noticed is that pandas 2.0.0 uses lineterminator
rather than line_terminator
: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
But I'm still not sure why that error would even surface since kor
isn't specifying a line-terminator
In the stack trace that you provided it does:
---> 77 table_content = pd.DataFrame(data_to_output, columns=field_names).to_csv(
78 index=False, sep=DELIMITER
79 )
This was resolved after updating pandas from 1.4.4 to 1.5.3 Thanks @eyurtsev
I am unable to run the last step in the document extraction article. The function
extract_from_documents
returns below error -Looks like something changed at the
langchain
end