Closed Cube707 closed 2 years ago
Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:
Thanks for the issue!
Yeah this seems to need better tests and maybe rethink the whole if opts["check"]:
block.
I belive this could benifit from rethingking the whole line-endings proccedure. We should probaly do something like this:
open(file, "r", newline="")
)StringIO
markdown_it
expects and returnsproof of concept:
from io import StringIO
with open("test.md", "r", newline="") as fp:
firstline = fp.readline()
if firstline[-2:]=="\r\n":
newlinechar = "\r\n"
elif firstline[-1:]=="\r":
newlinechar = "\r"
else:
newlinechar = "\n"
# show the unconverted file contents:
fp.seek(0)
print(fp.read().encode())
fp.seek(0)
mdbuffer = StringIO(fp.read(), newline=None) # apply universal newline conversion
original_str = mdbuffer.read() # we dont actually need to save the buffer, but maybe its usefull??
# proof that we have the correct newline char and how the string looks now
print(f"newline={repr(newlinechar)}")
print(original_str.encode())
# ---
# here we would do the actuall formating (so basicly the existing code)
# ---
# write out the file, applying the original newlines (or change per options):
with open(1, "w", newline=newlinechar) as stdout:
stdout.write(original_str)
results:
Unix-Newlines: ```console b'# Test file\n\nHello world\n' newline='\n' b'# Test file\n\nHello world\n' # Test file Hello world ``` | DOS-Newlines: ```console b'# Test file\r\n\r\nHello world\r\n' newline='\r\n' b'# Test file\n\nHello world\n' # Test file Hello world ``` | Mac-Newlines: (note how the output gets overriden because the `\r` is making the cursor return) ```console b'# Test file\r\rHello world\r' newline='\r' b'# Test file\n\nHello world\n' Hello world ``` |
this would however require a lot of rewriting and changes to the existing code, but could assist with the unification of the api and cli codebases
after thinking a bit more about this I realised that we require the ability to detect when files where changed by the newline-conversion procces (for example on mixed newlines or when a option overrides the used newline character). So this version allowes for that:
from io import StringIO
with open("test.md", "r", newline="") as fp:
firstline = fp.readline()
if firstline[-2:]=="\r\n":
newlinechar = "\r\n"
elif firstline[-1:]=="\r":
newlinechar = "\r"
else:
newlinechar = "\n"
fp.seek(0)
original_filecontent = fp.read()
# show the unconverted file contents:
print(original_filecontent.encode())
original_str = StringIO(original_filecontent, newline=None).read() # apply universal newline conversion
# proof that we have the correct newline char and how the string looks now
print(f"newline={repr(newlinechar)}")
print(original_str.encode())
# ---
# here we would do the actuall formating (so basicly the existing code)
formated_str = original_str
# ---
formated_filecontents = StringIO(formated_str, newline=newlinechar).read()
print(formated_filecontents.encode())
print( formated_filecontents == original_filecontent )
and there is also the IncrementalNewlineDecoder
class, which is what is used internally on universal newline mode, so it might be usefull to use and allow for better performance
It's not that clear to me what benefit the use of StringIO
etc provides for us here. I think https://github.com/executablebooks/mdformat/pull/348 is sufficient to fix this issue.
The benifit would be to rely on python existing universal newline magick instead of handeling it with search and replace opterations ourself (and mostlikle missing edgecases in the procces)
We could handle newline stuff only in the two places they are actually relevant, when reading and writing a file, and keep the rest of the codebase free of this problem and rely on the python convention that all strings use \n
.
Also this would make it easyer to handle stdin
, as the same logic could be applyed. Its just another buffer after all
But jea. #348 is a good fix for now. This is more of an idea for if the arcetecture gets a big generall overhaul
Describe the bug
context running
mdformat --check --end-of-line=crlf
on a correctly formated markdonw file result in error.expectation I expected
mdformat
to succede and nothing to happenbug But instead an error is rased that the file is not formated:
related code
I belive the problem thems from here:
https://github.com/executablebooks/mdformat/blob/24f9d25960d28efc75cbfa051881bac5fc46a842/src/mdformat/_cli.py#L85-L87
debugging reveald the content of
formatted_str
andoriginal_str
to have different lineendings:I assume that the formating process internally does a universal newline conversion, resulting in all newline types beeing converted to
\n
Reproduce the bug
create a well formated mardkown file
test.md
with dos-style line endings (\r\n
):run mdformat on it with the
--check
and--end-of-line=crlf
options:get error
List your environment
OS: windows 10 Python: v3.10.6 mdformat: 0.7.15