Apparently, Windows doesn't allow control characters (codepoints < 32), in filenames. (Source: highly upvoted SO answer — I haven't used Windows for a while, so I wasn't sure myself.)
Our current sanitize_anki_deck_name function doesn't, however, filter these characters out. It's highly unlikely that a user would actually use a control character (other than a newline (already accounted for) or possibly a tab?) in a deck name, and AFAICT (briefly testing* — quickly source diving I didn't find anything) Anki automatically filters out control characters from deck names, so it's probably not a major issue.
* OTOH Anki 2.1.43 seems to filter out newlines for me, while they clearly used to be a problem (see #44), so the filtering out might just be locale-specific. (Or it might have been introduced in some recent version of Anki, so newlines would no longer be a problem.)
Context
I encountered this while trying to run our mamba tests on Windows, in GitHub actions (as a first step for having a test for #138). Our name_sanitizer test currently checks for what happens with text including codepoints from 1 to 800 (i.e. including all the control characters). Hence, if hypothesis happens to generate a string containing a control character, then the test will fail on Windows:
```
......................***........Falsifying example: can_create(
potential_name='\x1f',
)
F
1 examples failed of 31 ran in 1.3936 seconds
Failures:
1) AnkiDeckNameSanitizer it should be possible to create a file name from a random sanitized string
Failure/Error: test\utils\filesystem\name_sanitizer_spec.py can_create()
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpmgmx00bx\\\x1f'
File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\pathlib.py", line 1273, in mkdir
self._accessor.mkdir(self, mode)
File "test\utils\filesystem\name_sanitizer_spec.py", line 32, in can_create
Path(dir_name).joinpath(sanitize_anki_deck_name(potential_name)).mkdir()
File "C:\Users\runneradmin\.virtualenvs\CrowdAnki-N-R4mrsk\lib\site-packages\hypothesis\core.py", line 1102, in wrapped_test
raise the_error_hypothesis_found
File "test\utils\filesystem\name_sanitizer_spec.py", line 27, in can_create
.filter(byte_length_size))
File "test\utils\filesystem\name_sanitizer_spec.py", line 35, in 00000049__it should be possible to create a file name from a random sanitized string
can_create()
Error: Process completed with exit code 1.
```
Solutions
(Lazy) increase the minimum codepoint in testing to 32 and hope that all control characters are filtered out by Anki. (IMO a bad idea, given the newline issue.)
(Possibly redundant) Change invalid_filename_chars in name_sanitizer.py to include all control characters. (Possibly no longer necessary, but IMO the better choice.)
Apparently, Windows doesn't allow control characters (codepoints < 32), in filenames. (Source: highly upvoted SO answer — I haven't used Windows for a while, so I wasn't sure myself.)
Our current
sanitize_anki_deck_name
function doesn't, however, filter these characters out. It's highly unlikely that a user would actually use a control character (other than a newline (already accounted for) or possibly a tab?) in a deck name, and AFAICT (briefly testing* — quickly source diving I didn't find anything) Anki automatically filters out control characters from deck names, so it's probably not a major issue.* OTOH Anki 2.1.43 seems to filter out newlines for me, while they clearly used to be a problem (see #44), so the filtering out might just be locale-specific. (Or it might have been introduced in some recent version of Anki, so newlines would no longer be a problem.)
Context
I encountered this while trying to run our mamba tests on Windows, in GitHub actions (as a first step for having a test for #138). Our
name_sanitizer
test currently checks for what happens with text including codepoints from 1 to 800 (i.e. including all the control characters). Hence, if hypothesis happens to generate a string containing a control character, then the test will fail on Windows:Solutions
invalid_filename_chars
inname_sanitizer.py
to include all control characters. (Possibly no longer necessary, but IMO the better choice.)