Stvad / CrowdAnki

Plugin for Anki SRS designed to facilitate cooperation on creation of notes and decks.
MIT License
534 stars 44 forks source link

Sanitising control characters in deck names (for Windows) #147

Closed aplaice closed 2 years ago

aplaice commented 2 years ago

Apparently, Windows doesn't allow control characters (codepoints < 32), in filenames. (Source: highly upvoted SO answer — I haven't used Windows for a while, so I wasn't sure myself.)

Our current sanitize_anki_deck_name function doesn't, however, filter these characters out. It's highly unlikely that a user would actually use a control character (other than a newline (already accounted for) or possibly a tab?) in a deck name, and AFAICT (briefly testing* — quickly source diving I didn't find anything) Anki automatically filters out control characters from deck names, so it's probably not a major issue.

* OTOH Anki 2.1.43 seems to filter out newlines for me, while they clearly used to be a problem (see #44), so the filtering out might just be locale-specific. (Or it might have been introduced in some recent version of Anki, so newlines would no longer be a problem.)

Context

I encountered this while trying to run our mamba tests on Windows, in GitHub actions (as a first step for having a test for #138). Our name_sanitizer test currently checks for what happens with text including codepoints from 1 to 800 (i.e. including all the control characters). Hence, if hypothesis happens to generate a string containing a control character, then the test will fail on Windows:

``` ......................***........Falsifying example: can_create( potential_name='\x1f', ) F 1 examples failed of 31 ran in 1.3936 seconds Failures: 1) AnkiDeckNameSanitizer it should be possible to create a file name from a random sanitized string Failure/Error: test\utils\filesystem\name_sanitizer_spec.py can_create() OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpmgmx00bx\\\x1f' File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\pathlib.py", line 1273, in mkdir self._accessor.mkdir(self, mode) File "test\utils\filesystem\name_sanitizer_spec.py", line 32, in can_create Path(dir_name).joinpath(sanitize_anki_deck_name(potential_name)).mkdir() File "C:\Users\runneradmin\.virtualenvs\CrowdAnki-N-R4mrsk\lib\site-packages\hypothesis\core.py", line 1102, in wrapped_test raise the_error_hypothesis_found File "test\utils\filesystem\name_sanitizer_spec.py", line 27, in can_create .filter(byte_length_size)) File "test\utils\filesystem\name_sanitizer_spec.py", line 35, in 00000049__it should be possible to create a file name from a random sanitized string can_create() Error: Process completed with exit code 1. ```

Solutions

  1. (Lazy) increase the minimum codepoint in testing to 32 and hope that all control characters are filtered out by Anki. (IMO a bad idea, given the newline issue.)
  2. (Possibly redundant) Change invalid_filename_chars in name_sanitizer.py to include all control characters. (Possibly no longer necessary, but IMO the better choice.)