KenKundert / nestedtext

Human readable and writable data interchange format
https://nestedtext.org
MIT License
362 stars 13 forks source link

Handling of carriage return and other control characters #17

Closed george-hopkins closed 3 years ago

george-hopkins commented 3 years ago

Is there a preferred way how to handle carriage returns and other control characters (e.g. form feed \f)? Especially on Windows, CR LF line endings quite common. Other characters (such as the aforementioned \f) are probably not intended but might sneak in if data was converted to NestedText from an other source.

KenKundert commented 3 years ago

Concerning the differing line ending conventions, I would expect a NestedText reader would accept all standard line endings. Python takes care of this for me, so I did not give it much thought. But it seems like a good idea to make it explicit in the specification and add some testcases.

Concerning embedded control characters like form feeds, vertical tabs, bells, etc. Those are utf8 characters, so they would go into a NestedText file without escaping and would pass through a dump load process unchanged. Having said that I just tried it on my Python implementation and it works for bells but not for form feeds. Apparently the splitlines string method in Python splits a line on a form feed character. Seems like I need to add some test cases.

KenKundert commented 3 years ago

I spent some time looking into this and resolving some issues in my implementation. Here is the summary. CR LF, CR, and LF are now all treated interchangeably. The NestedText document is broken into individual lines using any of those three line endings. From an implementation perspective, I convert CR LF and CR to NL and then split on NL. All other control characters are passed through NestedText without escaping.

I have also added the following test cases: string_8, string_9 and string_multiline_12.

Finally I have changed the specification to explicitly define CR LF, CR, and LF as all valid line separators.

george-hopkins commented 3 years ago

Wow, that was quick ;) Thank you for the clarification and the added test cases! I was already able to reproduce the expected results.