Closed AMR-K closed 4 years ago
@tabergma sorry for the tag if it's somehow spammy but can you help me with this issue.
The way \n
characters are escaped distorts my training data files.
Thanks :sweat_smile:
@akelad Could you please check this issue? Is there a reason for the way \n tokens are escaped in this way?
It's been added to one of our teams inboxes - can I ask how come you're using JSON in the first place? I believe that format might be deprecated soon
It's been added to one of our teams inboxes - can I ask how come you're using JSON in the first place? I believe that format might be deprecated soon
Well, I have just checked the rasa blog post for version 2.0 and noticed that yaml will be the format for data files. Json is the format that my team has been using for a while now and it's convenient since it can be easily manipulated / read by different programming languages. I don't find json to be human-readable and I preferred the MD format so that's why I needed to convert json files to MD, manipulate them and then convert them back to json.
yeah that makes sense - would using yaml once 2.0 be a good replacement option for you for json? Json will still be around for a while, but we will be encouraging users to switch to the new format.
Also, since you already found the area of the code that causes this issue, would you be up for submitting a PR to fix it?
I have only used yaml for pipeline configurations so I am not sure how it's used for nlu data (will give it a try soon). I have created a PR that un-escapes the \n tokens in a markdown file.
nice thanks!
O/ Akela,
I am checking the live docs https://rasa.com/docs/rasa/nlu/training-data-format/#data-formats but it looks like the yaml format isn't yet part of it. Will the docs be updated soon? I find it easier/ more convenient to check the online docs other than building them from source.
Thanks :smile:
It's still a work in progress sorry! you can take a peek here: https://github.com/RasaHQ/rasa/pull/6297/files
@AMR-KELEG still working on the docs but we'll have an update soon. once we merged the PR it will be available at https://rasa.com/docs/rasa/next
It's still a work in progress sorry! you can take a peek here: https://github.com/RasaHQ/rasa/pull/6297/files
No worries :smile: Thanks for the pointer. I will check the rst file for now.
Rasa version: rasa==1.10.3
Rasa SDK version (if used & relevant): rasa-sdk==1.10.2
Rasa X version (if used & relevant):
Python version: Python 3.7.8
Operating system Ubuntu 20
Issue: My team has training datasets with newline tokens
\n
as part of the text field in json files. We generally use the markdown format for inspecting the datafiles before converting them back to json so that we can easily manipulate them. But, converting the same json file to markdown and then back to json causes the escaping of newline tokens which isn't desirable.Error (including full traceback):
Command or request that led to error:
Code responsible for the issue: https://github.com/RasaHQ/rasa/blob/88ad06f3234ef68ecea9076e19747f3a07a097f4/rasa/nlu/training_data/formats/markdown.py#L51
https://github.com/RasaHQ/rasa/blob/88ad06f3234ef68ecea9076e19747f3a07a097f4/rasa/nlu/training_data/formats/markdown.py#L70