REPRODUCIBLE produces files in Mac format

ghuser404 commented 1 year ago

In glad2 MSVC on Windows complains about khrplatform.h/vk_platform.h being in Mac format and fails the build with /permissive-.

Dav1dde commented 1 year ago

What does that mean, mac format?

ghuser404 commented 1 year ago

Line endings. On Mac they are CR. On UNIX — LF. On Windows — CRLF.

Dav1dde commented 1 year ago

Both of these files are LF in the Repo.

ghuser404 commented 1 year ago

~~Maybe you see them as LF because you're cloning the repo and it automatically converts them from CR to LF on your computer but they are actually kept in CR. Could that be the case?~~

ghuser404 commented 1 year ago

OK, so I figured this is happening because you are converting binary file data into utf-8 to replace cpp-style comments. In glad1 this wasn't happening and binary data was going straight into the file as is. In fact, I tried that in glad2 and it actually gave me a UNIX-style file (with LFs). I found a link where someone is also having a problem with decode('utf-8'). https://stackoverflow.com/questions/491921/unicode-utf-8-reading-and-writing-to-files-in-python

Maybe we can replace the comments by parsing the binary format, without calling decode('utf-8')?

Here is the code that I'm talking about:

def _add_additional_headers(self, feature_set, config):
    if config['HEADER_ONLY']:
        return

    for header in self.ADDITIONAL_HEADERS:
        if header.name not in feature_set.types:
            continue

        path = os.path.join(self.path, 'include/{}'.format(header.include))

        directory_path = os.path.split(path)[0]
        if not os.path.exists(directory_path):
            os.makedirs(directory_path)

        if not os.path.exists(path):
            content = self._read_header(header.url)
            with open(path, 'w') as dest:
                dest.write(content)

def _read_header(self, url):
    if url not in self._headers:
        with closing(self.opener.urlopen(url)) as src:
            header = src.read().decode('utf-8')

        header = replace_cpp_style_comments(header)
        self._headers[url] = header

    return self._headers[url]

Dav1dde commented 1 year ago

Thanks for digging into this, we have to decode the bytes here into Utf-8 to get a string, running a Regex over bytes isn't supported iirc. The file should be read in binary mode, so this should be correct and yield the correct data. I suspect the with open(path, 'w') as dest: might be the culprit and this should be opened with Utf-8 encoding with open(path, 'w', encoding='utf-8') as dest:. Without it, it falls back to locale.getencoding().

Can you see if replacing open(path, 'w') with open(path, 'w', encoding='utf-8') fixes your issue?

ghuser404 commented 1 year ago

I tried doing open(path, 'w', encoding='utf-8'), but unfortunately that doesn't help 😞

ghuser404 commented 1 year ago

Right... I see what's happening. My glad repo is cloned in CRLF format, and when string.decode('utf-8') is called, it converts LF into CRLF. In the end I get CRCRLF.

Wonder if there is an as-is (UNIX style) decode.

Dav1dde commented 1 year ago

No Python does not convert line endings when decoding. Git probably converts LF to CRLF when you check out the repo (autocrlf git setting) on Windows.

Can you upload the problematic file (not copy paste)?

ghuser404 commented 1 year ago

Correct, git clones repo in CRLF format. So when I run glad with REPRODUCIBLE, it reads the file as bytes and then converts in into string, and when it sees LF, it writes CRLF, so I end up with CRCRLF. I verified this by cloning repo in LF format. Then after running glad, khrplatform.h/vk_platform.h files will be in CRLF format, and not in CRCRLF (because there was only LF and it turned into CRLF).

Here is the file: vk_platform.zip

Dav1dde commented 1 year ago

I can't reproduce your issues. The files are written with system newlines, CRLF on windows. When running python -m glad --out-path build --api "vulkan" --extensions="" --reproducible c on Windows 10, the files correctly contain a single CRLF.

ghuser404 commented 1 year ago

Do you clone repo in LF or CRLF format on Windows?

Dav1dde commented 1 year ago

Mh this is weird, the first newlines are correct then the later ones are broken:

Red = Broken, Blue = Correct

Dav1dde commented 1 year ago

Mh I think the regex is broken

ghuser404 commented 1 year ago

The very first one is just CR. Then CRLF. And then everything is broken. But this is because of cpp-style comment replacement in python MULTILINE mode.

You can clone repo in CRLF format and run glad command, then you'll be able tor reproduce this issue.

ghuser404 commented 1 year ago

Mh I think the regex is broken

No, I tried removing the cpp-style comments replacement calls, and I still get the issue. The problem is because after decode('utf-8') the string is in the CRCRLF format. So fix could be before calling decode('utf-8') to remove all CR symbols.

Dav1dde commented 1 year ago

@ghuser404 can you try https://github.com/Dav1dde/glad/commit/1377964bbb8f30e489e18a016b4b4ef46b5646e4? That seems to have fixed it for me.

ghuser404 commented 1 year ago

@Dav1dde This doesn't fix it. In fact, you actually broken cpp-style comments replacement. See file attached. 😔 vk_platform.zip

Dav1dde commented 1 year ago

oops, should be actually fixed now

ghuser404 commented 1 year ago

That fixes it for me too! Very good job, thanks! 🙌

Dav1dde / glad

REPRODUCIBLE produces files in Mac format #423