Closed ghuser404 closed 1 year ago
What does that mean, mac format?
Line endings. On Mac they are CR. On UNIX — LF. On Windows — CRLF.
Both of these files are LF
in the Repo.
Maybe you see them as LF because you're cloning the repo and it automatically converts them from CR to LF on your computer but they are actually kept in CR. Could that be the case?
OK, so I figured this is happening because you are converting binary file data into utf-8 to replace cpp-style comments. In glad1 this wasn't happening and binary data was going straight into the file as is. In fact, I tried that in glad2 and it actually gave me a UNIX-style file (with LFs). I found a link where someone is also having a problem with decode('utf-8')
.
https://stackoverflow.com/questions/491921/unicode-utf-8-reading-and-writing-to-files-in-python
Maybe we can replace the comments by parsing the binary format, without calling decode('utf-8')
?
Here is the code that I'm talking about:
def _add_additional_headers(self, feature_set, config):
if config['HEADER_ONLY']:
return
for header in self.ADDITIONAL_HEADERS:
if header.name not in feature_set.types:
continue
path = os.path.join(self.path, 'include/{}'.format(header.include))
directory_path = os.path.split(path)[0]
if not os.path.exists(directory_path):
os.makedirs(directory_path)
if not os.path.exists(path):
content = self._read_header(header.url)
with open(path, 'w') as dest:
dest.write(content)
def _read_header(self, url):
if url not in self._headers:
with closing(self.opener.urlopen(url)) as src:
header = src.read().decode('utf-8')
header = replace_cpp_style_comments(header)
self._headers[url] = header
return self._headers[url]
Thanks for digging into this, we have to decode the bytes here into Utf-8 to get a string, running a Regex over bytes isn't supported iirc. The file should be read in binary mode, so this should be correct and yield the correct data. I suspect the with open(path, 'w') as dest:
might be the culprit and this should be opened with Utf-8 encoding with open(path, 'w', encoding='utf-8') as dest:
. Without it, it falls back to locale.getencoding()
.
Can you see if replacing open(path, 'w')
with open(path, 'w', encoding='utf-8')
fixes your issue?
I tried doing open(path, 'w', encoding='utf-8')
, but unfortunately that doesn't help 😞
Right... I see what's happening. My glad repo is cloned in CRLF format, and when string.decode('utf-8')
is called, it converts LF
into CRLF
. In the end I get CRCRLF
.
Wonder if there is an as-is (UNIX style) decode.
No Python does not convert line endings when decoding. Git probably converts LF to CRLF when you check out the repo (autocrlf
git setting) on Windows.
Can you upload the problematic file (not copy paste)?
Correct, git clones repo in CRLF
format. So when I run glad with REPRODUCIBLE
, it reads the file as bytes and then converts in into string, and when it sees LF
, it writes CRLF
, so I end up with CRCRLF
. I verified this by cloning repo in LF
format. Then after running glad, khrplatform.h/vk_platform.h files will be in CRLF
format, and not in CRCRLF
(because there was only LF
and it turned into CRLF
).
Here is the file: vk_platform.zip
I can't reproduce your issues. The files are written with system newlines, CRLF
on windows. When running python -m glad --out-path build --api "vulkan" --extensions="" --reproducible c
on Windows 10, the files correctly contain a single CRLF
.
Do you clone repo in LF
or CRLF
format on Windows?
Mh this is weird, the first newlines are correct then the later ones are broken:
Red = Broken, Blue = Correct
Mh I think the regex is broken
The very first one is just CR. Then CRLF. And then everything is broken. But this is because of cpp-style comment replacement in python MULTILINE mode.
You can clone repo in CRLF format and run glad command, then you'll be able tor reproduce this issue.
Mh I think the regex is broken
No, I tried removing the cpp-style comments replacement calls, and I still get the issue. The problem is because after decode('utf-8')
the string is in the CRCRLF
format. So fix could be before calling decode('utf-8')
to remove all CR
symbols.
@ghuser404 can you try https://github.com/Dav1dde/glad/commit/1377964bbb8f30e489e18a016b4b4ef46b5646e4? That seems to have fixed it for me.
@Dav1dde This doesn't fix it. In fact, you actually broken cpp-style comments replacement. See file attached. 😔 vk_platform.zip
oops, should be actually fixed now
That fixes it for me too! Very good job, thanks! 🙌
In glad2 MSVC on Windows complains about khrplatform.h/vk_platform.h being in Mac format and fails the build with
/permissive-
.