crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.47k stars 1.62k forks source link

Move `LibC` into `Crystal::System` namespace #13504

Open straight-shoota opened 1 year ago

straight-shoota commented 1 year ago

The bindings in LibC are incomplete and not intended as public API exposed in the standard library. They only go as far as needed in terms of building blocks for stlib implementations.

Moving the bindings into Crystal::System namespace explicitly communicates that they are an internal tool and not a public API.

This has been mentioned before in https://github.com/crystal-lang/crystal/issues/7339#issuecomment-612413093 and https://github.com/crystal-lang/crystal/pull/11955#issuecomment-1105559259

It's to be expected that LibC bindings from stdlib are called from user code. It's not intended for that, but this was not explicitly communicated in the past. LibC is not mentioned in the API docs because the code generator doesn't produce docs for lib types. So that could be understood as a technical limitation, not a intentional choice. I think it would be too hard to entirely remove LibC from stdlib and break existing code. I'd rather keep such a breaking change for the next major release, even though we might not critically bound to it. However, there's also not a huge cost for keeping the status quo around.

The easiest implementation to retain the API would probably be alias LibC = Crystal::System::LibC. We could also consider duplicating the bindings so we can move independently, but I fear this would be too much overhead and complexity.

HertzDevil commented 1 year ago

The types like LibC::Int will stay, right?

straight-shoota commented 1 year ago

Ay, that's a good point. I hadn't intended that. But it certainly would make sense to have basic C types easily available.

In fact, you need them in any lib type. Thus it's common to have aliases like these for convenience: https://github.com/crystal-lang/crystal/blob/b2498a37e4c9588baca0a927e8a69f1136742c18/src/crystal/lib_iconv.cr#L11-L13

I'm wondering if there could be alternatives to make this more convenient. Maybe have basic C types implicitly defined in any lib type? Or put them in a C namespace that you can include into a lib type?

HertzDevil commented 1 year ago

There is an added complication here: SizeT is technically part of c/stddef, not lib_c, but I definitely don't want to see third-party code write alias SizeT = Crystal::System::LibC::SizeT.

straight-shoota commented 1 year ago

I think that is just a technicality. size_t is the type of the sizeof operation, which is part of the language. So I'd argue that by extension, the type size_t is also part of the language. Its definition just happens to be in a header file. So I think we can consider it a basic type in C.

The other standard types are the integer and float type for which we have aliases in lib_c.cr, as well as Void, Pointer, Proc and StaticArray.

My idea would be to put these standard C types in a namespace called C (as the language, not the library). And this namespace would be implicitly included in all lib namespaces.

HertzDevil commented 1 year ago

Will the compiler disallow things other than alias definitions in module C?

straight-shoota commented 1 year ago

Probably not, why?

I mean it should be discouraged to use it for anything else. But I don't think the compiler needs to enforce it.

HertzDevil commented 1 year ago

Because most likely we'd start with normal libs using the normal semantics of include to include a normal module. Something in this process must be special to forbid things like defs inside a lib

straight-shoota commented 1 year ago

Ah yes. That should happen when including C into a lib, not when defining it.

However, maybe C should actually be a lib type? This obviously requires compiler support, but so would including a module into a lib. There's not much reusability for that, so I figure this could be entirely encapsulated in the compiler.

HertzDevil commented 1 year ago

I am also worried that C might be too common a name im the top-level namespace (snippets using A, B, C etc. instead of Foo, Bar etc.). Maybe CTypes is better.

Perhaps including libs into other libs is a better choice here, both implicitly in the compiler, and explicitly in the code, although we would then also need compiler support for documenting the base lib.

HertzDevil commented 1 year ago

Also the platform-specific directory selection in lib_c is hardcoded in the compiler. Do we continue using src/lib_c for Crystal::System::LibC?

straight-shoota commented 1 year ago

I suppose so. There should be no issue with that except that it's in the namespace Crystal but not in src/crystal, right? Maybe symlinking src/lib_c to src/crystal/lib_c could be an option to improve on that...?

HertzDevil commented 1 year ago

I think as a first step we should put those types in a new module CTypes, then make LibC's types aliases to them (or Crystal::System::LibC's types too once we have that):

# src/c_types.cr
module CTypes
  alias Int = Int32
end

# src/lib_c.cr
require "c_types"

module LibC
  alias Int = CTypes::Int
end

Being able to include CTypes in libs is a nice convenience but not necessary; many bindings only need a handful of those types, e.g. PCRE2 only needs Int and SizeT, and writing out just the needed aliases doesn't hurt much.

ysbaddaden commented 7 months ago

Is there really a problem to fix?

What's wrong with src/lib_c and LibC in general? What's the point of introducing yet another module for holding C types when we already have LibC that's already used and expected?

I understand that the bindings are incomplete, that it can break if we're removing symbols the stdlib doesn't need anymore, but LibC won't go away anytime soon (and IMO never). Looking at the problem from another point of view, we could introduce a libc shard that would aim to be complete (and ideally autogenerated), and there wouldn't be an issue. Heck, crystal could eventually depend on it :shrug:

I'm wondering if there could be alternatives to make this more convenient

Yes, drop the LibXYZ libs and put every extern C libs into LibC directly! No more artificial LibXML2 or LibPCRE that push us to alias every single symbol when we can use the actual symbol names, no need to alias C types: they're already available.

HertzDevil commented 7 months ago

The C runtime may not be present even though the C language and calling conventions etc. are there, e.g. when writing pure Win32 code and needing to avoid linking against ucrt / libucrt. Thus there is a valid scenario in which src/lib_c.cr and src/lib_c/*/stddef.cr won't be required at all (though I guess CTypes somewhat blurs the line, because it refers to size_t from the C standard library whereas the Win32 SIZE_T is technically a different type).

But another advantage of CTypes is public documentation. Publishing those types here means we avoid the hard problem of deciding whether the docs generator should support lib types.

Yes, drop the LibXYZ libs and put every extern C libs into LibC directly!

That would mean passing all the linker flags from every single lib as soon as the libs are required, even when the source code doesn't call any of those lib functions, because under normal circumstances you can't avoid calling into the C runtime.

straight-shoota commented 7 months ago

I agree 100% with @HertzDevil that separatation into individual lib types is a useful feature of Crystal to avoid implicitly linking libraries which are never called.

I think it makes a lot of sense to define the basic C data types outside of LibC because they are also used outside of it: Other lib bindings, syscalls, FFI, binary protocols.

ysbaddaden commented 7 months ago

I think there is a confusion around the extern C world: IMO LibC is that whole extern C world (types, syscalls, general ABI), not just a binding to the C library. I wish we kept using the C namespace for it instead of the restrictive LibC which forces us to create even more modules just for holding a dozen type aliases (just to alias them again everywhere) :disappointed:

binary protocols

We don't want arch-dependent sizes for binary protocols (different targets would fail to communicate). Even ELF files use explicit integer sizes for example :grin:

That would mean passing all the linker flags from every single lib as soon as the libs are required

Good point. Still, that's only really a problem for PCRE in practice, because it's always required by the prelude. The GC is always used & linked, and other bindings (LLVM, OpenSSL, XML2, ...) must be explicitly required, in which case I don't see a problem to link against the library; it won't get linked if unused anyway.

The easiest implementation to retain the API would probably be alias LibC = Crystal::System::LibC

This ain't backward compatible: we can't reopen LibC anymore and would have to reopen Crystal::System::LibC (sic) or use another name (sic) to have a libc shard or quickly declare a libc function (e.g. mknod): Error: LibC is not a lib, it's a alias.

Another advantage of CTypes is public documentation

I think that's a problem: the types are target dependent but the docs will declare them as being an alias to a specific type, that depends on which target was passed to the docs generator. This can be confusing, if not misleading.

The C type names could instead be manually documented in the guide about extern C bindings, with actual examples (not hardcoded aliases), or maybe even the list per target; that would be more useful.

straight-shoota commented 6 months ago

That would mean passing all the linker flags from every single lib as soon as the libs are required

Good point. Still, that's only really a problem for PCRE in practice, because it's always required by the prelude. The GC is always used & linked, and other bindings (LLVM, OpenSSL, XML2, ...) must be explicitly required, in which case I don't see a problem to link against the library; it won't get linked if unused anyway.

PCRE may be the only practical application currently in the core library. But we have an entire ecosystem to take into account. It's not that uncommon for libraries to define C bindings which may or may not be actually linked, depending on which parts of the library are used.

Due to the way the Crystal compiler autogenerates linker flags, this would mean the linker may need libraries to be available, despite not being used.

It's hard to gauge the effect of this, but I'd certainly expect that changing this would introduce a number of breakages. So we wouldn't be able to implement it easily.

And I really don't understand why merging all C bindings into a single lib type should even be considered a good idea. Sure, in C there are no such namespaces or a notion about which library defines a symbol. But I think it's a great enhancement that Crystal has a means to provide more structure.

Yes, drop the LibXYZ libs and put every extern C libs into LibC directly! No more artificial LibXML2 or LibPCRE that push us to alias every single symbol when we can use the actual symbol names

I don't think there's anything preventing us from using the actual symbol names. It's just conventional to drop the lib namespace prefixes, so that the Crystal name for __gmpz_init is LibGMP.init. You could just keep the original name if you wanted. But I don't see a lot of benefit from that. It just makes calls more complex.

Putting individual libraries' bindings into separate namespaces is also just a convention, but it has some practical consequences for generating linker arguments.

ysbaddaden commented 6 months ago

I prefer using actual symbols because it's the actual name and we should always properly name things. Creating fun aliases or renaming struct names, we create a virtual interface, which requires one more hoop to go from Crystal to C and vice versa.

For example when reading the shard's code, you must reach the lib definition to know the actual symbol to start searching its original documentation. Same when trying to understand a segfault stacktrace, or why the linker fails with missing symbols. It's one more hoop, but a boring one, then repeat with how many symbols you want to reach :rage:

In the end we spend time on an abstraction that nobody will use in the end, and that only makes things harder to follow & decipher :confused:

I prefer to cram everything together because this is the reality: everything's crammed together in C, and that avoids to repeat the same aliases to C types in each and every lib definition, without any protection, because I can create a new LibFoo::SizeT that aliases UInt64 and that would work... until you compile for ARM32 and get segfaults.

Though that's only for types and constants, because fun definitions are shared across all lib definitions and global definitions: they must have the same signature, otherwise it's a compilation error. So the namespacing is both real (types) and fake (funs), protected (funs) and unprotected (types).

Cramming everything together avoids the above pitfalls, and we don't document lib definitions (on purpose: they're not meant to be used), so we don't care that everything's crammed together.

Now it's under par because of the linker annotations, but maybe we could think differently? For example instead of merging the annotations each time we reopen lib X, maybe they could only apply to their immediately defined symbols only?

Or maybe we don't need to do anything, or each lib could always include the C types by default, we could recommend to always use the actual symbols (don't waste your time on that intermediate layer, better: autogenerate it).