KenKundert / emborg

Interactive command line interface to Borg Backup
GNU General Public License v3.0
94 stars 8 forks source link

Default configuration files created by emborg init can't accept paths with spaces #29

Closed coridonhenshaw closed 4 years ago

coridonhenshaw commented 4 years ago

The default configuration files created by running emborg init are not able to interpret source directories or exclude masks that include spaces or single quotes. The Python split() method does not have provision for escaping spaces or other characters. As a result, including a path with a space in an Emborg configuration file will result in Emborg attempting to parse the path as two, incomplete, paths, with undesirable results.

If the configuration files are rewritten to use Python list syntax, rather than split(), Emborg processes paths with spaces correctly. For example, this configuration does work without any changes to Emborg proper:

src_dirs = ['/mnt/drive to back up', '/mnt/other drive to back up' ] excludes = [ '*/desktop.ini', '*/Thumbs.db', '*/stuff to exclude/*' ]

Note however, that if there are any syntax errors in the exclusion list (such as a missed trailing comma), Emborg will execute a backup but ignore the entire exclusion list. Presumably this is also true if using the standard configuration files. I feel syntax errors in the configuration files should be treated as fatal errors.

KenKundert commented 4 years ago

Your timing is quite something. I just today pushed a rather substantial change to Emborg which addresses this issue. If you want to try it, it is version 1.14.4 and its only on github at the moment. Specifically, it allows you to specify the src_dirs and excludes in multi-line strings without using split. Instead, Emborg splits the string for you and it uses a line based approach rather than simply breaking on white space. So your example could be given as follows:

src_dirs = """
    /mnt/drive to back up
    /mnt/other drive to back up
"""
excludes = """
    */desktop.ini
    */Thumbs.db
    */stuff to exclude/*
"""

Enborg splits the lines, and then removes all leading and trailing white space. Blank lines are ignored, as are comments (lines that start with #). Its not perfect as files that either start or end with white space cannot be specified. But it does give you most of the benefits of using a list of strings without the risk of forgetting a comma.

-Ken

coridonhenshaw commented 4 years ago

I will take a look at 1.14.4 when I have the opportunity. Might be a while, however.

Are there any advantages to rolling your own parser for this instead of using something pre-built, like one of Python's many XML parsers? I've always hated writing custom parsers because of the difficulty of dealing with edge cases (such as, in this case, filenames that begin or end with spaces), which are deeply undesirable in software that needs to be highly predictable. XML (for example) is awkard to write but does deal with this corner case easily.

From my perspective, using Python list syntax for the configuration files is a workable solution to the spaces problem if emborg throws an error if the syntax is invalid. I'm not familiar enough with the emborg's codebase to know why syntax errors here are silently ignored. I presume exec() throws on syntax errors but the exception is getting lost by a try/except block somewhere.

KenKundert commented 4 years ago

I did not role my own parser. I used Python's parser. It is my habit to use Python for config files because it is easy, clean, and the fact that you can run code in the config file means that you can often overcome limitations in the underlying program. In my experience this can be very useful early on, but becomes less useful as the needs are recognized and resolved in the program itself.

XML is an awful language for people, so I would never use that. Nor would I consider JSON, not as bad as XML but still awkward for people. YaML might have been okay, or perhaps ToML. But I never really considered either. Python is my language.

The 'syntax' issue you were having was a result of a little used feature in Python. If you place two string literals next to each other, Python combines them. This allows you to break a long string into a collection of shorter strings. However, it does enable a user trap, which you fell into. Here is an example:

excludes = [
    'aaa',
    'bbb',
    'ccc',
]

That is a list of three strings, 'aaa', 'bbb', and 'ccc'.

excludes = [
     'aaa'
     'bbb'
     'ccc'
]

That is a list of one string, 'aaabbbccc'. The difference is the lack of commas in the second case.