garcia / simfile

A modern simfile parsing & editing library for Python 3
MIT License
62 stars 7 forks source link

MSD parser doesn't handle backslash escapes #10

Closed vyhd closed 2 years ago

vyhd commented 3 years ago

First off, thanks for building this library! It's been remarkably easy to use for simfile introspection.

I ran into this unfortunate edge case while parsing through DDR A20 PLUS content:

Traceback (most recent call last):
  File "/Users/vyhd/.pyenv/versions/3.9.2/lib/python3.9/site-packages/simfile/__init__.py", line 89, in open
    return open_with_detected_encoding(
  File "/Users/vyhd/.pyenv/versions/3.9.2/lib/python3.9/site-packages/simfile/__init__.py", line 137, in open_with_detected_encoding
    return (load(file, strict=strict), encoding)
  File "/Users/vyhd/.pyenv/versions/3.9.2/lib/python3.9/site-packages/simfile/__init__.py", line 67, in load
    return SMSimfile(file=file, strict=strict)
  File "/Users/vyhd/.pyenv/versions/3.9.2/lib/python3.9/site-packages/simfile/base.py", line 139, in __init__
    self._parse(parse_msd(
  File "/Users/vyhd/.pyenv/versions/3.9.2/lib/python3.9/site-packages/simfile/sm.py", line 159, in _parse
    for (key, value) in parser:
  File "/Users/vyhd/.pyenv/versions/3.9.2/lib/python3.9/site-packages/msdparser/__init__.py", line 117, in parse_msd
    ps.write(char)
  File "/Users/vyhd/.pyenv/versions/3.9.2/lib/python3.9/site-packages/msdparser/__init__.py", line 62, in write
    raise MSDParserError(f"stray {repr(text)} encountered")
msdparser.MSDParserError: stray 'G' encountered

which was triggered by this line: #SOURCE:ゲーム「STEINS\;GATE」; in the SM file located at: https://zenius-i-vanisher.com/v5.2/viewsimfile.php?simfileid=45612

It looks like StepMania's MSD parser allows you to use \ to escape any control character (source), so this lib probably should be able to as well.

I... might be able to submit a PR for this sometime in the next few weeks? But I wanted to flag it now, in case someone gets to it first.

garcia commented 3 years ago

Interesting, I must have overlooked this feature when I wrote the initial parser. I should've referred back to SM source code for the rewrite, but instead just used my old implementation & test cases.

We'll need to update msdparser, and we'll also need to consider how to escape control characters during serialization from this package. Right now serialization is all done ad-hoc (given that it's simple enough to interpolate some #:; characters around some strings) but we might need to make it a little more structured to do this right. Maybe a simfile._private.msd module with helper functions?

garcia commented 2 years ago

Alright, I landed on a solution for this that I think is reasonable, and it's now available as msdparser version 2.0.0b1.

You can upgrade this package underneath simfile to get escape-handling behavior out of the box, but only for reading simfiles. Writing simfiles with escapes will require an update to the simfile package; the new msdparser release offers a solution for this, but it's not being called from simfile yet.

Documentation for this pre-release can be found here.

Upgrading to msdparser 2.0.0b1

Poetry: poetry add msdparser@2.0.0b1 Pipenv: pipenv install --pre msdparser==2.0.0b1

Smoke test

If the title parses as A;B and not A\, you're good to go:

>>> import simfile
>>> simfile.loads('#TITLE:A\;B;')
<SMSimfile: A;B>

Again, don't upgrade if you need to write simfiles. Wait for simfile version 2.1.0 for the complete fix.

garcia commented 2 years ago

Fixed in v2.1.0.