Closed mxgordon closed 2 months ago
When I tried to use my own dataset (from the Cornell movies dialogues), it would throw
Traceback (most recent call last): File "prepare_data.py", line 563, in <module> prepare() File "prepare_data.py", line 79, in prepare number_of_records = min(amount, sum(1 for _ in open_function(source_file_name, 'rt', encoding='utf-8', **additioan_params))) File "prepare_data.py", line 79, in <genexpr> number_of_records = min(amount, sum(1 for _ in open_function(source_file_name, 'rt', encoding='utf-8', **additioan_params))) File "/usr/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 6816: invalid start byte
So I told it to ignore any decoding error like that. This just adds ease of use to the program.
When I tried to use my own dataset (from the Cornell movies dialogues), it would throw
So I told it to ignore any decoding error like that. This just adds ease of use to the program.