davidgiven / cowgol

A self-hosted Ada-inspired programming language for very small systems.
BSD 2-Clause "Simplified" License
238 stars 20 forks source link

Cowgol cannot detect EOF character on Unix? #88

Open ibara opened 3 years ago

ibara commented 3 years ago

Hello.

I have been running Cowgol on OpenBSD for a while, preparing to create a cowgol package with all the different permutations of native and cross compiler. It works great on OpenBSD. I have been able to create native programs with it as well as programs for all the different CPU and platform combinations that get built by the standard ninja build.

One thing I noticed is that Cowgol does not seem able to detect the standard Unix EOF character. If I'm reading the code correctly, the FCB struct holds a sliding window of the current file, including the current character, but that buffer is of uint8's, whereas (modern) Unix uses -1 as its EOF character. Because you always zero out the window containing the currently read in piece of the open file, you can test for 0 as an EOF character. Fine if you're working with text files. Less fine if you're working with binary files, such as if you're writing a hexdump utility in Cowgol: https://github.com/ibara/cowgol-utilities/blob/main/hexdump.cow

As you can see, I did come up with a workaround. But that workaround is both surprising to the programmer and probably not something that should be exposed by the average program (use of the @asm semantics to hook into libc functions).

Wondering if I missed something that would have made detecting EOF on Unix easier. If not, wondering if it is desirable for me to go about figuring out how to make Unix and CP/M and DOS and the rest to all be happy with EOF detection.

Thanks!

davidgiven commented 3 years ago

Yes, that's correct. The FCB API is a bit bodged together (and some of the implementations are just plain wrong and I need to fix them; don't trust FCBSeek for now). The way it's supposed to work is: you can seek and read anywhere in the file from 0 to FCBExt. Inside that range you get data. Outside that range you get 0. To determine where the end of the file is, you need to use FCBExt to look at the current file position. FCBExt is guaranteed to be at or after the last byte written to the file.

You can blame CP/M and Acorn MOS for this, as CP/M doesn't track the length of files beyond how many sectors there are, and MOS has really weird EOF behaviour when reading blocks. Plus, FCBs always refer to files, which have a fixed size, unlike Unix file descriptors which are streams. It's not very satisfactory but it is at least pretty simple. (It'll be simpler when I fix FCBSeek...) Also the buffering behaviour is... poor... and really needs to be replaced with something better.

Also: I'm glad it works on OpenBSD! I don't have one of those, or at least, I don't think I do, and github's CI only supports Linux. There are lots of gotchas when making things work outside the Linux world.

ibara commented 3 years ago

Sounds good. I'll wait for the fixes and between then and now I'll just make slightly different versions of the utility for different platforms as I need them.

davidgiven commented 3 years ago

You shouldn't need to, actually. Provided you start at the beginning and read through to the end, which the hex dumper is, it should work fine. You don't need FileLength() as FCBExt() should give you the same result, although most platforms appear to round it up to the nearest block --- not sure why I did that.

BTW: you're actually writing code in Cowgol? Gosh. I'm not sure how I feel about that... also, I've just added an extremely bad ARM backend.

ibara commented 3 years ago

I see what you mean. I will try that out later.

And I've only written that hex dumper so far. I enjoy one-person languages.

And I saw about the ARM backend. I explicitly turn off the 80386 and arm native toolchains (I leave their cross compilers available) because the 80386 backend does not produce assembly that passes OpenBSD's strict checks (forbidden relocations). I have not tried the arm backend yet. Can get you a log of the 80386 backend later if you're interested.

ibara commented 3 years ago

You're right. FCBExt got me what I needed. It does round up to the nearest block on Unix but that's fine with me. Thanks again!

davidgiven commented 3 years ago

Re OpenBSD: yes, please. I found an old machine and installed OpenBSD on it, but of course Cowgol only generates Linux binaries. If you have something which should (but doesn't) generate OpenBSD binaries that'd be good to have.

davidgiven commented 3 years ago

FWIW, I've rewritten the file I/O stuff so that there's proper buffering and byte-accurate file lengths on platform which support it. I'd like to keep the same EOF handling, though, as it makes lots of things easier; but I think there are still some bugs here.

ibara commented 3 years ago

I think that's fine. It works for me. (Feel free to close this issue.)