MattDMo / PythonImproved

The best Python language definition for Sublime Text - ever. Includes full support for Unicode, as well as both Python 2 and Python 3 syntax. Check out the Neon Color Scheme for highlighting.
https://packagecontrol.io/packages/Python%20Improved
MIT License
93 stars 11 forks source link

Add support for new-style string formatting #38

Closed sprt closed 6 years ago

MattDMo commented 9 years ago

Can you please elaborate? What sort of support would you like to see?

sprt commented 9 years ago

See https://docs.python.org/2/library/string.html#formatstrings

For example, {} in '{}'.format(foo) isn't properly highlighted but '%s' % foo is.

MattDMo commented 9 years ago

OK, I'll see what I can do. Development is kind of at a low ebb at the moment, as I'm pretty busy with my real job, but I'll definitely put this on the list. If you have any contributions to make, feel free to submit a PR.

sprt commented 9 years ago

Okay I came up with this regex:

(?<![^\{]\{)\{(?:(?:(?:[A-Za-z_]\w*|(?:[1-9]\d*|0|0[Oo]?[0-7]+|0[Xx][0-7]+|0[Bb][01]+)))?(?:\.[A-Za-z_]\w*|\[(?:(?:[1-9]\d*|0|0[Oo]?[0-7]+|0[Xx][0-7]+|0[Bb][01]+)|[^\]\}}\{{]+)\])*)?(?:\![rsa])?(?:\:(?:(?:.?[<>=\^])?[\x20+-]?\#?0?(?:[1-9]\d*|0|0[Oo]?[0-7]+|0[Xx][0-7]+|0[Bb][01]+)?,?(?:\.(?:[1-9]\d*|0|0[Oo]?[0-7]+|0[Xx][0-7]+|0[Bb][01]+))?[bcdEeFfGgnosXx%]?))?\}(?!\}[^\}])

This is the script I used to build it:

import re

def merge(*args):
    return r'(?:{})'.format(r'|'.join(args))

# integer
bininteger = r'0[Bb][01]+'
hexinteger = r'0[Xx][0-7]+'
octinteger = r'0[Oo]?[0-7]+'
decimalinteger = r'[1-9]\d*|0'
integer = merge(decimalinteger, octinteger, hexinteger, bininteger)

# format_spec
type_ = r'[bcdEeFfGgnosXx%]'
precision = integer
width = integer
sign = r'[\x20+-]'
align = r'[<>=\^]'
fill = r'.'
format_spec = r'(?:(?:{fill}?{align})?{sign}?\#?0?{width}?,?(?:\.{precision})?{type_}?)'.format(**locals())

conversion = r'[rsa]'
index_string = r'[^\]\}}\{{]+'
identifier = r'[A-Za-z_]\w*'
element_index = merge(integer, index_string)
attribute_name = identifier
arg_name = r'(?:{})?'.format(merge(identifier, integer))
field_name = r'(?:{arg_name}(?:\.{attribute_name}|\[{element_index}\])*)'.format(**locals())
replacement_field = r'(?<![^\{{]\{{)\{{{field_name}?(?:\!{conversion})?(?:\:{format_spec})?\}}(?!\}}[^\}}])'.format(**locals())

print(replacement_field)

You can see here that it works pretty well. I just copy-pasted the examples from the Python docs.

These are the only strings it can't parse:

'{:%Y-%m-%d %H:%M:%S}'
'{0:{fill}{align}16}'
'{0:{width}{base}}'

but to be honest they don't look valid according to the grammar. There seem to be some discrepancies between the docs and the implementation as I had to edit the index_string regex according to this.