RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.6k stars 4.59k forks source link

The IO default encoding error #2539

Closed XiaofeiQian closed 5 years ago

XiaofeiQian commented 5 years ago
**Rasa Core version**: 0.11.6 **Python version**: 3.6.5 **Operating system** (windows, osx, ...): Windows 10 64-bit 1803 **Issue**: I update rasa_core by issue #985, and get an error when training core: ``` train_core('domain.yml', 'models/core', 'data/core/') ``` ``` --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) in () ----> 1 train_core('domain.yml', 'models/core', 'data/core/') in train_core(domain_file, model_path, training_folder) 7 FallbackPolicy(nlu_threshold=0.6, core_threshold=0.3)]) 8 ----> 9 training_data = agent.load_data(training_folder) 10 11 agent.train(training_data, epochs=100) D:\Programs\Anaconda3\envs\nlp\lib\site-packages\rasa_core\agent.py in load_data(self, resource_name, remove_duplicates, unique_last_num_states, augmentation_factor, tracker_limit, use_story_concatenation, debug_plots) 470 augmentation_factor, 471 tracker_limit, use_story_concatenation, --> 472 debug_plots) 473 474 def train(self, D:\Programs\Anaconda3\envs\nlp\lib\site-packages\rasa_core\training\__init__.py in load_data(resource_name, domain, remove_duplicates, unique_last_num_states, augmentation_factor, tracker_limit, use_story_concatenation, debug_plots) 46 47 if resource_name: ---> 48 graph = extract_story_graph(resource_name, domain) 49 50 g = TrainingDataGenerator(graph, domain, D:\Programs\Anaconda3\envs\nlp\lib\site-packages\rasa_core\training\__init__.py in extract_story_graph(resource_name, domain, interpreter) 27 interpreter = RegexInterpreter() 28 story_steps = StoryFileReader.read_from_folder(resource_name, ---> 29 domain, interpreter) 30 return StoryGraph(story_steps) 31 D:\Programs\Anaconda3\envs\nlp\lib\site-packages\rasa_core\training\dsl.py in read_from_folder(resource_name, domain, interpreter, template_variables) 139 for f in nlu_utils.list_files(resource_name): 140 steps = StoryFileReader.read_from_file(f, domain, interpreter, --> 141 template_variables) 142 story_steps.extend(steps) 143 return story_steps D:\Programs\Anaconda3\envs\nlp\lib\site-packages\rasa_core\training\dsl.py in read_from_file(filename, domain, interpreter, template_variables) 150 try: 151 with io.open(filename, "r") as f: --> 152 lines = f.readlines() 153 reader = StoryFileReader(domain, interpreter, template_variables) 154 return reader.process_lines(lines) UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 5: illegal multibyte sequence ``` I know why I get this issue, and i can fix it by add "encoding=utf-8" in `D:\Programs\Anaconda3\envs\nlp\lib\site-packages\rasa_core\training\dsl.py` line 151. but I think other people maybe get the same issue, so can we do more about this? I run this code, all get "utf-8", but the io.open() not use "utf-8" ``` import sys sys.getdefaultencoding() sys.getfilesystemencoding() ``` Because **Python default encoding is platform dependent**(I'm not very sure, but i get this info by google), and I use Chinese Windows 10, so io.open not default use "utf-8" but "gbk". so can we set all file io default encoding to "utf-8" by hand, not depend platform? Thanks. **Content of domain file** (if used & relevant): ```yaml ```
tmbo commented 5 years ago

did some work there, but we need to add a couple tests to ensure this for the future

tmbo commented 5 years ago

did some work there, but we need to add a couple tests to ensure this for the future