Closed Quuxplusone closed 14 years ago
Bugzilla Link | PR180 |
Status | RESOLVED FIXED |
Importance | P normal |
Reported by | Reid Spencer (rspencer@reidspencer.com) |
Reported on | 2003-12-13 04:54:03 -0800 |
Last modified on | 2010-03-06 13:59:17 -0800 |
Version | 1.0 |
Hardware | All All |
CC | brukman+bugs@uiuc.edu, clattner@nondot.org, gaeke+bugs@uiuc.edu, jtcriswel@gmail.com, llvm-bugs@lists.llvm.org |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
I know so very little about autconf it is embarassing, so I'm adding John &
Brian, who know more about this.
However, I ___strongly___ support any effort which gets config.h out of public
header files. However, I'm not sure how to do this with DataTypes.h. This file
is specifically looking for things like BIG_ENDIAN/LITTLE_ENDIAN and friends,
which seem like they have to be autoconf'd. <cinttypes> doesn't include any of
this information does it? What do you plan to do to fix this?
-Chris
At first I thought it was going to be as simple as just making sure all the .cpp
file included some internal header file that included config.h. But, after
looking into it, I discovered that the public header files are dependent on lots
of the HAVE_* symbols because they define different data structures, etc. So,
my approach now is simply to find a way to name these things more appropriately
so that, for example, HAVE_* becomes LLVM_HAVE_*. This way the symbols can be
used in public header files without name collision .. best of both worlds. I'm
currently researching if autoconf/autoheader can support this automagically. If
they can, it will be easy, if not I might think of another solution.
On a side note, I'm thinking about just fixing all the issues having to do with
*using* LLVM in another library package in one shot. The namespace change was a
step in the right direction but the remaining issues have to do with using
autoconf in a standard way, using automake, getting non-public things out of
header files, and generally making LLVM a little more self contained. It would
be a big patch but I wouldn't want to make these kind of changes incrementally
wthout making sure *everything* works.
Thoughts?
> So, my approach now is simply to find a way to name these things
> more appropriately so that, for example, HAVE_* becomes LLVM_HAVE_*.
I personally don't care if the autoconf variables are prefixed with LLVM_ (or
better yet, LLVMAC_), so if you can figure out how to do that, that would be a
nice non-intrusive change.
> The namespace change was a step in the right direction but the
> remaining issues have to do with using autoconf in a standard way,
> using automake, getting non-public things out of header files, and
> generally making LLVM a little more self contained.
I generally support these kinds of changes, but we don't want to be changing a
lot of stuff gratuitously. There has to be a clear reason for it...
> It would be a big patch but I wouldn't want to make these kind of
> changes incrementally wthout making sure *everything* works.
I think that the only feasible way to do this kind of stuff is as incremental as
possible, checking at each step that _everything_ works. Otherwise it will be
impossible to untangle which change caused which problem.
-Chris
> I personally don't care if the autoconf variables are prefixed with LLVM_ (or
> better yet, LLVMAC_), so if you can figure out how to do that, that would be a
> nice non-intrusive change.
I agree, but its not looking possible. Too many things depend on the HAVE_
prefix and don't have a way to specify an alternate.
> I generally support these kinds of changes, but we don't want to be changing a
> lot of stuff gratuitously. There has to be a clear reason for it...
The reason is so that LLVM can be used in a meaningful way! If all I was doing
was creating a language on top of LLVM, I probably wouldn't run into the issues
that I do. But, I'm actually trying to embedd LLVM into a larger project and
there are numerous issues. The namespaces were one, that's done. The make and
config system is another. LLVM is currently a "one off" system. That is, it
doesn't follow the accepted practice of 90% of other open source Unix/C++
packages. That means if you're going to incorporate it into something larger
that you have to deal with it specially (i.e. actually understand it!)
> I think that the only feasible way to do this kind of stuff is as
> incremental as possible, checking at each step that _everything_ works.
> Otherwise it will be impossible to untangle which change caused which
> problem.
My point was that this is a large incremental step. You can't do just some of
the makefiles, they all have to be done. You can't do the makefile changes
without the autoconf changes. That is, to get everything working right, there's
new makefiles and several changes to autoconf.ac. If the config.h issues can be
cleanly split out, I'll make that change separately but there rest of it is just
a large incremental change.
> I agree, but its not looking possible. Too many things depend on the HAVE_
> prefix and don't have a way to specify an alternate.
Ok, do you have any other ideas for how to implement this? How do other
projects cope with conflicting autoconf configurations? It appears that we only
include our config.h header in
Support/{DataTypes.h|hash_map|hash_set|slist|iterator}. I'm not sure how to
handle these files other than through autoconf. That said, we can't be the only
project in the world that has to deal with the mysterious transmogrification of
hash_map across compiler versions...
>> I generally support these kinds of changes, but we don't want to be
>> changing a lot of stuff gratuitously. There has to be a clear reason
>> for it...
> The reason is so that LLVM can be used in a meaningful way!
If there is a good reason to change something (like better integration with
other source-bases), and it doesn't effect functionality, I have no problem
changing it.
> My point was that this is a large incremental step. You can't do just
> some of the makefiles, they all have to be done. You can't do the
> makefile changes.
We just have to do things as incrementally as possible and make use of CVS
branches for other major changes that cannot be incrementalized. For example, a
lot of changes to the makefile can be made to regularize them for automake,
before switching over to automake. This is how the GCC project was ultimately
switched over to autoconf IIRC...
-Chris
> Ok, do you have any other ideas for how to implement this? How do other
> projects cope with conflicting autoconf configurations?
Other projects don't create the conflicts in the first place. Its generally not
"kosher" to make one's public header files depend on the contents of your
config.h file. That is reserved for the internal headers and source code files.
By making one's public headers depend on config.h a project is basically saying
to the users of the package either: (a) we will take charge of all machine
configuration stuff (very unfriendly for other library developers) or (b) the
project's interface is configuration dependent.
> It appears that we only include our config.h header in
> Support/{DataTypes.h|hash_map|hash_set|slist|iterator}.
Yes, but are these files and the things they declare REALLY part of the public
interface you want to expose to LLVM users? I wouldn't think so, they are
implementation details. Correct me if I'm wrong.
> I'm not sure how to handle these files other than through autoconf.
The use of autoconf (and autoheader) is not the problem. The way the #includes
are structured is. Another related problem is that the modern way to use public
#includes is to place them all in a directory specifically for your project. So
that
#include <Support/Statistic.h>
becomes
#include <llvm/Support/Statistic.h>
This completely removes conflicts with some package named "Support" as long as
there isn't some other package named "llvm". In general, resolving whole
package naming conflicts is MUCH easier than resolving them at the file by file
level.
LLVM gets this mostly right. The include directory has an "llvm" directory that
correctly hides all the header files. Unfortunately, it also has the "Support"
directory which should really be a sub-directory of LLVM and possibly named
"internal". However, it also contains stuff that isn't really internal (like
Statistic.h).
> That said, we can't be the only project in the world that has to deal with
> the mysterious transmogrification of hash_map across compiler versions...
Actually, its the first time I've seen it. While it might be a great utility, it
should be submitted to the C++ compiler folks for inclusion in some future
"standard". By exposing these header files in your public API, you're
essentially forcing users of LLVM to "do it your way" regardless of the needs of
the users of the LLVM.
> If there is a good reason to change something (like better integration with
> other source-bases), and it doesn't effect functionality, I have no problem
> changing it.
None of what I'm suggesting will affect functionality but it will (and should)
affect the public API to LLVM (by reducing its scope). I think the driving
"good reason" for it is to foster adoption of LLVM in more projects. I may be
one of the early users of LLVM in a large way but I won't be the last! If we can
solve these problems then other users of LLVM will have a smooth ride
integrating LLVM into their projects. And, as a side effect, LLVM core
developers won't be hassled with the same integration issues coming up over and
over again.
> We just have to do things as incrementally as possible and make use of CVS
> branches for other major changes that cannot be incrementalized.
As much as I'd like to, I can't go that route. I don't have CVS access other
than read-only anonymous. I can't create a branch or a tag or check in or
anything else. :(
> For example, a lot of changes to the makefile can be made to regularize
> them for automake, before switching over to automake.
This is already done in my build environment. It can almost co-exist with the
existing makefile system. But, I haven't gotten all the way through a complete
build (including running tests) yet.
> This is how the GCC project was ultimately switched over to autoconf IIRC...
LLVM is early enough in its lifespan that hopefully we can avoid such a
scenario.
I think there are two basic parts to the needed changes:
(1) Add automake support (change configure.ac and add Makefile.am files). This
is mostly done in my build environment. I'll eventually submit the patch
under its correct bug number (bug 106).
(2) Remove non-essential header files from the public API, especially config.h
This is the much more challenging task as it requires some reorganization of
the header files, not just changing their contents. Since I don't have CVS
access, I'm somewhat hesitant to take this on.
It seems like we have multiple issues crammed into this bug.
1) I, for one, agree that we should move our remaining separate includes (e.g.,
include/Support/, include/Config/) into a single llvm/ directory.
Speaking personally, I don't understand why we need to use the Config/foo
headers instead of just using #ifdef HAVE_FOO; #include <foo>; #endif, because
most of them contain exactly that. I would support efforts to replace uses of
the former with the latter.
2) As for things which currently expose HAVE_* macros, remember that any
autoconf test which currently outputs a definition of HAVE_SOMETHING can be
rewritten to output whatever definition would be in effect if HAVE_SOMETHING
were to be defined. There is nothing that says autoconf tests have to output
conditional compilation directives, other than the force of tradition -- it's
just a Small Matter of Programming, for appropriate values of "Small".
3) Looking back at the summary for this bug, I don't see how this bug should
have anything to do with the use or disuse of automake. That issue absolutely
has to undergo its *own* cost-benefit analysis. We should be careful not to
confuse the issues...
That said, whether or not an outside developer will find LLVM's Makefiles
strange enough to consider them an impediment to adopting LLVM is strictly a
matter of perspective. For example, LLVM's Makefiles are different from automake
makefiles, but they are not very different from those you find in a typical BSD
build system.
As an aside, the inclusion (or not) of hash_map in the standard has been a long,
long flamewar in the C++ standardization community, but for reasons which I do
not fully understand. Nevertheless, hash_map is a very widely-known and
widely-used nonstandard extension; it would be a mistake to speak of it as if it
were some kind of bogus addon from Mars.
>> It appears that we only include our config.h header in
>> Support/{DataTypes.h|hash_map|hash_set|slist|iterator}.
> Yes, but are these files and the things they declare REALLY part of the public
> interface you want to expose to LLVM users? I wouldn't think so, they are
> implementation details. Correct me if I'm wrong.
You are absolutely right, of course. Take hash_map as an example.
include/llvm/Analysis/AliasSetTracker.h uses hash_map for example as an
implementation detail (ie, private member) in one of its classes. However,
without using a pimpl or something, I don't know how we can hide this from
clients. C++ just wasn't designed for this form of data hiding.
>> That said, we can't be the only project in the world that has to deal with
>> the mysterious transmogrification of hash_map across compiler versions...
> Actually, its the first time I've seen it. While it might be a great
> utility, it should be submitted to the C++ compiler folks for inclusion
> in some future "standard".
hash_map is not our invention. It predates standard C++, and they chose not to
standardize it in the initial C++ standard due to lack of time. C++0x is
supposed to contain standardized hashed containers, for example, see:
http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1456.html
The problem is that hash tables are a tremendously useful container and often
can improve performance quite a bit: but they aren't yet standardized. That's
what the Support/hash_map header is trying to work around. With it, clients can
use hash_map's without needing to know a lot of the vagarities of different
implementations.
> By exposing these header files in your public API, you're essentially
> forcing users of LLVM to "do it your way" regardless of the needs of
> the users of the LLVM.
I understand exactly what you are saying, and I certainly understand what you
mean by it being a bad thing, but I'm not sure how to fix it. Perhaps what we
need to do is have a really stripped down version of the config.h file that has
just the necessities used by DataTypes.h and hash_map/set, and have those
prefixed with LLVMAC_?
> Another related problem is that the modern way to use public #includes is
> to place them all in a directory specifically for your project. So that
> #include <Support/Statistic.h>
> becomes
> #include <llvm/Support/Statistic.h>
This can certainly be done. File a bug and I'll take care of it at some point,
probably before 1.2.
> As much as I'd like to, I can't go that route. I don't have CVS access
> other than read-only anonymous. I can't create a branch or a tag or
> check in or anything else. :(
While we can probably get you CVS access, I'm not sure how that would be
different than developing things locally, then pushing out a big diff.
In any case, the other guys probably have beter/more constructive feedback than
I do. Remember again that I don't really know anything about this stuff! :)
-Chris
> Speaking personally, I don't understand why we need to use the Config/foo
> headers instead of just using #ifdef HAVE_FOO; #include <foo>; #endif, because
> most of them contain exactly that. I would support efforts to replace uses of
> the former with the latter.
This is of course almost completely pointless. The reason the #include's exist
is to get the definition of a symbol. Many of the current files in Config/foo
will simply turn one compile-time error into another. This is one area where
C++ is very different than C (requires prototypes). This problem should be
fixed by doing stuff like FileUtilities.h, where we _abstract out_ the system
interfaces we want...
> 3) Looking back at the summary for this bug, I don't see how this bug
> should have anything to do with the use or disuse of automake. That
> issue absolutely has to undergo its *own* cost-benefit analysis. We
> should be careful not to confuse the issues...
Agreed.
-Chris
Follow Up To Brian:
1. Agreed on your point 1
2. Agreed on your point 2
3. Sorry for introducing the automake discussion here. For me its all part of
the larger issue of making LLVM easily usable. I'll confine further comments
to bug 106 which deals with the automake issue.
> As an aside, the inclusion (or not) of hash_map in the standard has been a
> long, long flamewar in the C++ standardization community, but for reasons
> which I do not fully understand. Nevertheless, hash_map is a very widely-known
> and widely-used nonstandard extension; it would be a mistake to speak of it
> as if it were some kind of bogus addon from Mars.
My comment was not directed at the hash_map itself but at the mechanism by which
LLVM disambiguates the multiple implementations of it (y'know, the nice thing
about standards is there's so many to choose from :). The mechanism is what I
was suggesting be sent to the standards groups. However, the use of the
mechanism in the LLVM header files forces the LLVM user into one model. This is
already a conflict for me. LLVM's configure script selects the GNU version of
hash_table. I have code that uses the SGI extension version directly. Since they
are based on the same implementation, I get conflicts when trying to use one of
my hash_maps at the same time as an LLVM header.
Follow Up To Chris:
> You are absolutely right, of course. Take hash_map as an example.
> include/llvm/Analysis/AliasSetTracker.h uses hash_map for example as an
> implementation detail (ie, private member) in one of its classes. However,
> without using a pimpl or something, I don't know how we can hide this from
> clients. C++ just wasn't designed for this form of data hiding.
There are two design choices that support this kind of hiding:
1. Forward declarations of internal types
2. Inheritance and/or Polymorphism.
The former has the side effect that you can only define a private member as a
pointer to the forward declared type. This implies a memory allocation which in
many instances is undesireable.
The latter avoids the memory allocation but requires that the user invoke some
kind of special allocation function to instantiate the actual (subclass) type.
LLVM already supports this approach well for Passes. I suggest it use the
technique on a more broad scale.
> hash_map is not our invention. It predates standard C++, and they chose
> not to standardize it in the initial C++ standard due to lack of time.
> C++0x is supposed to contain standardized hashed containers, for example,
> see: http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1456.html
Please, I've been using hash_map since the early 1990s. I'm well versed with it.
My comment (which I now see was not stated clearly) was directed at the
mechanism LLVM uses to disambiguate the various hash_map implementations. See my
comment above to Brian.
> The problem is that hash tables are a tremendously useful container and often
> can improve performance quite a bit: but they aren't yet standardized. That's
> what the Support/hash_map header is trying to work around. With it, clients
> can use hash_map's without needing to know a lot of the vagarities of
> different implementations.
Agreed, but that's exactly the problem. My system is a source base larger than
LLVM and it uses hash_map (SGI version) extensively. The fact that
Support/hash_map disambiguates the implementation is nifty but it also produces
a conflict for me.
> I understand exactly what you are saying, and I certainly understand what you
> mean by it being a bad thing, but I'm not sure how to fix it.
By designing the interface to LLVM (i.e. the entire IR and anything it needs) to
be the >minimal< set of things that will let you use the full functionality of
LLVM in a platform independent manner. Even if this means subclassing some of
the IR interface classes to hide the implementation details.
I'm assuming, of course, that you _want_ to design LLVM as a library/toolkit for
compiler writers. It certainly seems to me that being a library/toolkit is part
of LLVM's mission. When designing a library interface, very much care must be
taken in what you expose to the library user. This protects both the user and
the developer of the library. I think this discussion makes the benefits for the
user quite clear. What you may not realize is the significant benefit to the
developer of the library. A clear, concise, minimal interface provides the
developer with many degrees of freedom in the implementation. Sh!t happens and
code needs to get redesigned to support new functionality. With a good
interface design, the library user never knows this. For the most part, LLVM
follows such good design principles (you have lots of implementation classes
that the user just never even knows about). We're only talking here about the
few things that fell through the cracks :)
> Perhaps what we need to do is have a really stripped down version of the
> config.h file that has just the necessities used by DataTypes.h and
> hash_map/set, and have those prefixed with LLVMAC_?
I've scavenged the 'net all day trying to find something that describes how to
do this. The only prevailing wisdom I get is to not expose config.h to your
users. I've looked at about 5 standard libraries (Xerces, Xalan, ICU, gd, etc.)
and none of them expose their config.h. Using the LLVMAC_ prefix, while simple,
I think just obfuscates the issue further. People expect to look at config.h and
just see HAVE_. Furthermore tools such as autoheader (which generates
config.h.in) simply won't (without modification) accept the prefix. The _only_
solution I've seen is to not expose config.h and I believe that is the
RightThing(tm) to do, even if it isn't the easy thing.
>> Another related problem is that the modern way to use public #includes is
>> to place them all in a directory specifically for your project. So that
>> #include <Support/Statistic.h>
>> becomes
>> #include <llvm/Support/Statistic.h>
>This can certainly be done. File a bug and I'll take care of it at some point,
>probably before 1.2.
Consider it done.
> While we can probably get you CVS access, I'm not sure how that would be
> different than developing things locally, then pushing out a big diff.
If you take care of moving the header files for me (you probably want to move
the CVS ,v files to retain history), then I probably can do it as one big diff.
What I won't be able to do is commit to a branch for everyone else to use
separate from other work that's on going. For this kind of change, not having
CVS access is a pretty big drawback.
> In any case, the other guys probably have beter/more constructive feedback
> than I do. Remember again that I don't really know anything about this
> stuff! :)
Au contraire. You've already shown that you _do_ know somethings about this
area. Perhaps its just that you don't _want_ to know? Hmmm??? :->
> The reason the #include's exist is to get the definition of a symbol. Many of
> the current files in Config/foo will simply turn one compile-time error into
> another. This is one area where C++ is very different than C (requires
> prototypes). This problem should be fixed by doing stuff like
FileUtilities.h,
> where we _abstract out_ the system interfaces we want...
Yes, Yes, Yes!!!!! :)
> My comment was not directed at the hash_map itself but at the mechanism by
which
> LLVM disambiguates the multiple implementations of it (y'know, the nice thing
> about standards is there's so many to choose from :). The mechanism is what I
> was suggesting be sent to the standards groups. However, the use of the
Ok, so the real question is: what can we do about this? How can we fix this in
a way that works for LLVM without imposing structure on clients of LLVM headers?
>> However,
>> without using a pimpl or something, I don't know how we can hide this from
>> clients. C++ just wasn't designed for this form of data hiding.
> There are two design choices that support this kind of hiding:
> 1. Forward declarations of internal types
> 2. Inheritance and/or Polymorphism.
> The former has the side effect that you can only define a private member
> as a pointer to the forward declared type. This implies a memory
> allocation which in many instances is undesireable.
I strongly urge people to forward declare as many classes as possible. In my
mind, this is just one part of good class design. That said, if you want to
contain something by value, then you need its definition. In addition to the
extra memory allocation, this also implies that simple accessors and such cannot
be inlined by non-smart compilers... This can be much more overhead than the
allocation.
> The latter avoids the memory allocation but requires that the user invoke some
> kind of special allocation function to instantiate the actual (subclass) type.
> LLVM already supports this approach well for Passes. I suggest it use the
> technique on a more broad scale.
That fine for course-grained classes like LLVM passes, but it doesn't work for
fine-grained classes like hash_map. This design technique requires that all
methods be made into virtual functions, which is totally unacceptable for
something a simple as ".empty()" or ".begin()". Also, this falls down when you
try to expose (possibly const) iterators into the underlying type.
> Please, I've been using hash_map since the early 1990s. I'm well versed
> with it. My comment (which I now see was not stated clearly) was directed
> at the mechanism LLVM uses to disambiguate the various hash_map
> implementations. See my comment above to Brian.
Sorry about that. I got the wrong impression from your comment. That said, I'm
still suprised that you haven't run into problems using hash_map with multiple
compilers. I wish that the performance difference was purely imagined so that
we could just go ahead and use std::map for everything. :( I suppose that one
alternative would be to clone a licence compatible implementation of hash_map
into the LLVM world. This would guarantee a standard interface, and allow
smooth migration to the C++0x implementation whenever it materializes. That
said, it's a nontrivial amount of work to do this _right_.
> By designing the interface to LLVM (i.e. the entire IR and anything it
> needs) to be the >minimal< set of things that will let you use the
> full functionality of LLVM in a platform independent manner. Even if
> this means subclassing some of the IR interface classes to hide
> the implementation details.
I understand that, but there are also pragmatic concerns such as how to
efficiently implement light-weight classes like AliasSetTracker which absolutely
_need_ certain methods to be inlined for decent performance. In a world where
compilers like G++ dominate the world, using better encapsulation is really not
an option. That said, gratuitously exposing internal structure is obviously
bad. I keep firm pressure on our developers to reduce the amount of #includage
in headers as much as possible.
> I'm assuming, of course, that you _want_ to design LLVM as a
> library/toolkit for compiler writers. It certainly seems to me that
> being a library/toolkit is part of LLVM's mission.
Of course this is a goal, without it, world domination isn't possible! :)
> When designing a library interface, very much care must be
> taken in what you expose to the library user.
Absolutely. FWIW, this is why we separated out the public interface
(llvm/include) from the implementations and the implementation headers. That
said, it's quite possible that there are places that could be improved more. :)
For example, we have a keyword "code-cleanup" for bugs that are related to
improving the API.
>> Perhaps what we need to do is have a really stripped down version of the
>> config.h file
> The _only_ solution I've seen is to not expose config.h and I believe
> that is the RightThing(tm) to do, even if it isn't the easy thing.
Ok, ok! I believe you! :) The only question is: how?
> If you take care of moving the header files for me (you probably want to
> move the CVS ,v files to retain history),
Will do. I'll try to do it soon after the 1.1 release (which is coming out
RSN). FWIW, I usually _copy_ the ,v files, then cvs rm the old location. I
think this is the preferred way to "move" files in CVS, do you concur?
> then I probably can do it as one big diff. What I won't be able to do
> is commit to a branch for everyone else to use separate from other work
> that's on going. For this kind of change, not having CVS access is a
> pretty big drawback.
I agree, in the meantime we can figure out what to do about CVS access. Worse
case, you can produce a giant diff, then one of the local people can commit it
to a branch. Better yet would be to get you access to CVS...
> Au contraire. You've already shown that you _do_ know somethings about this
> area. Perhaps its just that you don't _want_ to know? Hmmm??? :->
Hrm, I know a bit about C++ and program design, but know very little about the
specific auto* tools. In the next comment I'll attach some proposed solutions
to this bug.
-Chris
Here are some proposed solutions:
1. I've been successfully convinced that we _absolutely_ need to move the
include/Support directory into include/llvm/Support. This is entirely
mechanical, but should happen after the 1.1 release (this is Bug 183).
2. hash_map/hash_set are tremendous problems for us in so many ways it's not
even funny. The only realistic solution that I can think of to this
problem is inelegant, but will solve the problem just fine: invent
LLVM specific hashed containers which expose the same interface as the
"stl" ones, but are in a fixed location for LLVM. These would
presumably live in the support library. The only trick to this is that we
_do not_ want to invent our own containers. Instead we should find some
license compatible implementations with few dependencies and snarf them
into our sourcebase. :)
3. This leaves three headers:
* Support/slist (which is dead, and I'm deleting now)
* Support/iterator, which we already have "local" implementations of
classes we need, so it just needs to be "cleaned up"
* Support/DataTypes.h - This is the hard one, see below.
4. Implementing Support/DataTypes.h without gross hacks is going to be
difficult. I believe the reason that we have our own int64_t and
uint64_t implementations is that Sparc doesn't have a <cinttypes> header,
and probably other older platforms don't either. These data types are
_going_ to be needed by headers in the public interfaces, and the only
good way that I know of to implement this is with autoconf.
Suggestions please!
5. The other purpose of Support/DataTypes.h is for endianness detection. This
really really has absolutely no business affecting the public api of the
LLVM headers, so should be moved into a private header somewhere or
something. The llvm/Bytecode/Primitives.h header (the only user of
endianness stuff) deserves its own bug, which I'll file shortly (it will
probably end up as Bug 184). It's been on my hitlist for a long time
now, I have just never gotten around to it.
6. We have to decide how to represent headers which are public to global
LLVM implementation code, but private to the rest of the world (a place
to put headers such as config.h and the Endianness stuff, for example).
I don't have a wonderful answer to this problem, but it doesn't seem too
difficult.
What do you all think? This sounds like something that can be approached
incrementally, one piece at a time... if we agree on approach, the individual
pieces can be filed as their own bugs and this bug can depend on them. Because
we aren't doing any integration, we will have to rely on you Reid, to let us
know how we are doing and how close we are. :)
-Chris
> I'm still suprised that you haven't run into problems using hash_map with
> multiple compilers.
Of course I have. What I didn't run into was, again, your mechanism for
disambiguating. Man, I'm starting to regret having made that comment! :)
> I wish that the performance difference was purely imagined so that
> we could just go ahead and use std::map for everything. :(
No, DON'T do _that_! :)
> I suppose that one alternative would be to clone a licence compatible
> implementation of hash_map into the LLVM world. This would guarantee a
> standard interface, and allow smooth migration to the C++0x implementation
> whenever it materializes. That said, it's a nontrivial amount of work to
> do this _right_.
That just sounds like busy work to me.
> there are also pragmatic concerns such as how to efficiently implement
> light-weight classes like AliasSetTracker which absolutely _need_ certain
> methods to be inlined for decent performance.
Well, I think this underscores my point again. Why does AliasSetTracker.h need
to be part of the public interface? I don't know much about this file and its
contents but a cursory look at it suggests its an implementation detail.
Frankly, as a user of LLVM, I don't really care _how_ LLVM tracks aliases just
so long as it does. I don't ever see myself subclassing a new way to do alias
tracking. Is this really necessary to be exposed in the public interface? If it
is, I'll take your word for it ... it just doesn't seem that way to me.
> In a world where compilers like G++ dominate the world, using better
> encapsulation is really not an option.
Well, that's true.
> That said, gratuitously exposing internal structure is obviously bad. I keep
> firm pressure on our developers to reduce the amount of #includage in headers
> as much as possible.
I know you do, my comments aren't a complaint against LLVM .. just trying to
make it perfect :)
>> When designing a library interface, very much care must be
>> taken in what you expose to the library user.
> Absolutely. FWIW, this is why we separated out the public interface
> (llvm/include) from the implementations and the implementation headers.
I understand your reasoning, but frankly, that choice really bugs me. In my
view, headers and the related class files should live in the same directory.
Where the files end up after installation is something for the makefiles to
handle. My view is probably derived from my heavy use of automake which allows
me to easily assemble any #include hierarchy I want after installation. This way
you can have the best of both worlds: header files go in the natural place for
developers (near the related .cpp files) and go in the "right" place for users
(in some include directory in the install target).
>> The _only_ solution I've seen is to not expose config.h and I believe
>> that is the RightThing(tm) to do, even if it isn't the easy thing.
> Ok, ok! I believe you! :) The only question is: how?
Alright, I'll stop harping on this issue :)
> FWIW, I usually _copy_ the ,v files, then cvs rm the old location. I think
> this is the preferred way to "move" files in CVS, do you concur?
Yes, I concur. I guess I was thinking of "move" figuratively more than
literally.
> Here are some proposed solutions:
>1. I've been successfully convinced that we _absolutely_ need to move the
> include/Support directory into include/llvm/Support. This is entirely
> mechanical, but should happen after the 1.1 release (this is Bug 183).
Okay.
>2. hash_map/hash_set are tremendous problems for us in so many ways it's not
> even funny. The only realistic solution that I can think of to this
> problem is inelegant, but will solve the problem just fine: invent
> LLVM specific hashed containers which expose the same interface as the
> "stl" ones, but are in a fixed location for LLVM. These would
> presumably live in the support library. The only trick to this is that we
> _do not_ want to invent our own containers. Instead we should find some
> license compatible implementations with few dependencies and snarf them
> into our sourcebase. :)
Incorporating a licence compatible implementation sounds reasonable but it is
also just busy work. It will impact the source base somehow and doesn't add any
value. My first preference would be to keep using the "standard" implementations
of hash_map and solve the _HAVE_ problem. It turns out that several of the
#defines defined by LLVM's config.h don't conflict with mine because they
utilize LLVM specific tests.
To assist with this problem, here's what I've come up with:
1. Keep "internal" and "external" configuration #defines in separate .h files.
The usual HAVE_STRING_H, HAVE_XXX will be in the usual "config.h" for
compatibility with other common practices. The "external" configuration
#defines are likely to not conflict.
2. Rename all the "external" HAVE_* variables to LLVM_HAS_* variables. This is
the naming practice used by packages like ACE (albeit, it doesn't use
autoconf). These variables, since they are unique to LLVM won't class. To
support this we just need to change a few AC_DEFINE calls in configure.ac
and acincludes.m4 (I separate out the LLVM specific stuff from aclocal
generated stuff, aclocal.m4, in my source tree).
3. Adjust the #includage to ensure that the internal config.h is not exposed
publicly.
4. Either don't use autoheader and maintain the llvm/internal/config.h.in and
include/llvm/config.h.in files manually,
_OR_
write a shell script to be run immediately after autoheader that moves
the #defines starting with LLVM_HAS_ from llvm/internal/config.h.in to
include/llvm/config.h.in. This latter approach is more work up front but
could be significantly less work as LLVM matures and requires addtional
configuration variables.
So, to summarize, we need in configure.ac:
AC_CONFIG_HEADERS(include/internal/config.h) # internal config header
AC_CONFIG_HEADERS(include/llvm/config.h) # external config header
And we need to change HAVE_DLOPEN, HAVE_MALLINFO, HAVE_PTHREAD_MUTEX_LOCK (all
set via AC_SEARCH_LIBS) to be renamed as LLVM_HAS_DLOPEN, LLVM_HAS_MALLINFO,
LLVM_HAS_PTHREAD_MUTEX_LOCK, respetively.
In the acinclude.m4 (aclocal.m4) files we need to change every occurrence of
"AC_DEFINE(HAVE_" to "AC_DEFINE(LLVM_HAS_". This will affect symbols like
HAVE_NAMESPACES, HAVE_STD_EXT_HASH_MAP, etc.
If we want to use autoheader, we need a script to move the LLMV_HAS_ #undef
lines so that:
- in llvm/internal/config.h there should only be "#undef HAVE_*" lines.
- in llvm/config.h there should only be "#undef LLVM_HAS_*" lines.
This approach should take care of the problem.
Comments?
> 3. This leaves three headers:
> * Support/slist (which is dead, and I'm deleting now)
Okay :)
> * Support/iterator, which we already have "local" implementations of
> classes we need, so it just needs to be "cleaned up"
Okay :)
> * Support/DataTypes.h - This is the hard one, see below.
>
>4. Implementing Support/DataTypes.h without gross hacks is going to be
> difficult. I believe the reason that we have our own int64_t and
> uint64_t implementations is that Sparc doesn't have a <cinttypes> header,
> and probably other older platforms don't either. These data types are
> _going_ to be needed by headers in the public interfaces, and the only
> good way that I know of to implement this is with autoconf.
> Suggestions please!
The only thing I can think of is to create a #define, LLVM_UINT64_T, that
equates to the correct typename. The value is created either directly by
autoconf or via autoconf configured header files. On most modern platforms,
LLVM_UNIT64_T will just be uint64_t but if you need to implement a class to
support the type then it would be that class.
Please note that Solaris has its own uint64_t and int64_t definitions, just not
in <cinttypes>.
>5. The other purpose of Support/DataTypes.h is for endianness detection. This
> really really has absolutely no business affecting the public api of the
> LLVM headers, so should be moved into a private header somewhere or
> something. The llvm/Bytecode/Primitives.h header (the only user of
> endianness stuff) deserves its own bug, which I'll file shortly (it will
> probably end up as Bug 184). It's been on my hitlist for a long time
> now, I have just never gotten around to it.
Agreed.
>6. We have to decide how to represent headers which are public to global
> LLVM implementation code, but private to the rest of the world (a place
> to put headers such as config.h and the Endianness stuff, for example).
> I don't have a wonderful answer to this problem, but it doesn't seem too
> difficult.
The typical practice is to put them in include/llvm/internal. The internal
header files are actually installed publically with the rest of them but anyone
#including them or relying on their contents remaining compatible with the
library's interface assumes it at their own risk. This works well because
developers of the library have a standard way to get an LLVM wide header file
that by virtue of its name is a warning to users of the library:
#include <llvm/internal/MyInternalHeader.h>
Additionally, a directory named "experimental" or "notsupported" is common for
those purposes.
> What do you all think? This sounds like something that can be approached
> incrementally, one piece at a time... if we agree on approach, the individual
> pieces can be filed as their own bugs and this bug can depend on them.
The incremental approach is fine so long as the source base keeps on compiling
after each commit. You guys are very prolific and I have a cron job that updates
and recompiles every day. So far, its never failed in any big way. I wouldn't
want to see that change :)
> Because we aren't doing any integration, we will have to rely on you Reid,
> to let us know how we are doing and how close we are. :)
You'll be the _first_ to know. This is very high on my priority list. I'm trying
to get my source base LLVMized by the end of the year so XPL can be compiled to
bytecode. Right now I'm blocked because of these issues. Fortunately, there's
stuff I can work on around the edges but I'll test the changes you submit as
soon as I notice them.
Before anyone goes way out of their way to work on this, I'd like to note that
we're considering moving llvm/test/Programs into a separate CVS module and a
separate tarball. The reasons for this are:
1) Remove llvm/test/Programs from LLVM's autoconf configuration. The
llvm/test/Programs is kind of its own world away from llvm/*, and it uses
different build rules to build everything.
2) Reduce the size of LLVM proper. We want to add a lot more programs to the
test suite, and that will balloon the size of the LLVM distribution. Moving it
out keeps the download and updates of LLVM small, and allows people to skip
downloading tons of programs if they are not interested.
So, for anyone considering massive changes to the build system, don't worry
about getting everything under llvm/test/Programs working yet; you may not have
to do it, depending upon what we eventually decide on the issue.
> Of course I have. What I didn't run into was, again, your mechanism for
> disambiguating. Man, I'm starting to regret having made that comment! :)
Ah ok. How do you deal with the problem?
>> I suppose that one alternative would be to clone a licence compatible
>> implementation of hash_map into the LLVM world.
> That just sounds like busy work to me.
It is busy work, but it does sound like a solution to this problem, correct?
>> there are also pragmatic concerns such as how to efficiently implement
>> light-weight classes like AliasSetTracker which absolutely _need_ certain
>> methods to be inlined for decent performance.
> Well, I think this underscores my point again. Why does AliasSetTracker.h
> need to be part of the public interface? I don't know much about this file
> and its contents but a cursory look at it suggests its an implementation
> detail. Frankly, as a user of LLVM, I don't really care _how_ LLVM
> tracks aliases just so long as it does. I don't ever see myself subclassing
> a new way to do alias tracking. Is this really necessary to be exposed in
> the public interface? If it is, I'll take your word for it ... it just
> doesn't seem that way to me.
The problem is that there are multiple "kinds" of users of LLVM. There are
people like you who basically just need include/llvm/* (no subdirs) to generate
code. On the other hand, there are those who are writing analyses and
transformations outside of the LLVM tree, but whose code could be considered
"part of" LLVM. These people _need_ access to important LLVM API's like
AliasSetTracker so that they can implement their transformation or whatever
making use of the infrastructure we have.
> I understand your reasoning, but frankly, that choice really bugs me. In my
> view, headers and the related class files should live in the same directory.
I understand what you're saying, and FWIW, Vikram shares your view. The problem
is that when you start diving into a large source base (like LLVM is slowly
growing into), you really want to be shielded from as much detail as possible.
This means that you want to public APIs to be as simple to find as possible.
The other advantage of separating them out is it makes it extremely clear what
is an "internal" and "external" header file. That said, we probably shouldn't
get into this discussion now, as it will only dillute the real point of this
bug.
> Incorporating a licence compatible implementation sounds reasonable but it
> is also just busy work. It will impact the source base somehow and doesn't
> add any value. My first preference would be to keep using the "standard"
> implementations of hash_map and solve the _HAVE_ problem. It turns out
> that several of the #defines defined by LLVM's config.h don't conflict
> with mine because they utilize LLVM specific tests.
I don't really understand what you're saying here. What "standard"
implementation of hash_map are you talking about? The whole reason we have the
header is that there are no standard versions... Also, just because many LLVM
config.h #defines happen to not conflict with yours doesn't mean that new ones
won't in the future. You've successfully convinced me that we _have_ to get
config.h out of the public headers.
> Keep "internal" and "external" configuration #defines in separate .h files.
> The usual HAVE_STRING_H, HAVE_XXX will be in the usual "config.h" for
> compatibility with other common practices. The "external" configuration
> #defines are likely to not conflict.
I thought that this wasn't possible with autoconf?
-Chris
>> Of course I have. What I didn't run into was, again, your mechanism for
>> disambiguating. Man, I'm starting to regret having made that comment! :)
>Ah ok. How do you deal with the problem?
I generally just pick an implementation (usually gnu) and if the platform
doesn't support it, I port it myself.
>>> I suppose that one alternative would be to clone a licence compatible
>>> implementation of hash_map into the LLVM world.
>> That just sounds like busy work to me.
>It is busy work, but it does sound like a solution to this problem, correct?
Yes, it is one solution.
> The problem is that there are multiple "kinds" of users of LLVM. There are
> people like you who basically just need include/llvm/* (no subdirs) to
> generate code. On the other hand, there are those who are writing analyses
> and transformations outside of the LLVM tree, but whose code could be
> considered "part of" LLVM. These people _need_ access to important LLVM
> API's like AliasSetTracker so that they can implement their transformation
> or whatever making use of the infrastructure we have.
Yes, that makes sense .. multiple audiences. It seems like there are three
camps:
- users (just want to write a compiler and use LLVM facilities)
- extenders (want to extend LLVM framework .. new passes, etc. )
- developers (changing/extending core LLVM abstractions)
It _might_ be useful to reflect these three usage categories in the public APIs
of LLVM.
> The problem is that when you start diving into a large source base (like LLVM
> is slowly growing into), you really want to be shielded from as much detail
> as possible. This means that you want to public APIs to be as simple to find
> as possible. The other advantage of separating them out is it makes it
> extremely clear what is an "internal" and "external" header file. That said,
> we probably shouldn't get into this discussion now, as it will only dillute
> the real point of this bug.
Just one point: the "simple to find" is handled by doxygen. Frankly, I would
rather look at doxygen than raw header files. That being the case, it doesn't
really matter where those header files live in the hierarchy. Furthermore,
users are going to be more interested in the _installed_ hierarchy. LLVM
doesn't support install yet, but it should and it will come for free with
automake. Since only the public headers would get copied on install, it
accomplishes what you're attempting to achieve without separating the files.
>> Incorporating a licence compatible implementation sounds reasonable but it
>> is also just busy work. It will impact the source base somehow and doesn't
>> add any value. My first preference would be to keep using the "standard"
>> implementations of hash_map and solve the _HAVE_ problem. It turns out
>> that several of the #defines defined by LLVM's config.h don't conflict
>> with mine because they utilize LLVM specific tests.
> I don't really understand what you're saying here. What "standard"
> implementation of hash_map are you talking about?
I meant the std and gnu versions as "standard". That is, we should use code
that's already written, not hack something up that is specific to LLVM.
> The whole reason we have the header is that there are no standard
> versions...
I was using the term "standard" lightly (see the quotes?). In my view there are
two defacto "standard" versions: std and gnu. They are the defacto standards
because these are the ones that get used most often.
> Also, just because many LLVM config.h #defines happen to not conflict with
> yours doesn't mean that new ones won't in the future.
True 'nuff.
> You've successfully convinced me that we _have_ to get config.h out of
> the public headers.
Yes.
>> Keep "internal" and "external" configuration #defines in separate .h files.
>> The usual HAVE_STRING_H, HAVE_XXX will be in the usual "config.h" for
>> compatibility with other common practices. The "external" configuration
>> #defines are likely to not conflict.
> I thought that this wasn't possible with autoconf?
Oh, _anything_ is possible with autoconf, its just a matter of how much work
you want to work for it. I was originally looking for the autoheader/autoconf
command line option that would take care of it. Such a beast doesn't exist. The
long explanation I gave in my last message is how to do it the hard way.
Keeping separate config files is pretty straight forward (one extra line in
configure.ac). The hard part is reworking all the m4 macros that generate
AC_DEFINE(HAVE_*) calls to use AC_DEFINE(LLVM_HAS_*) calls. The reason for
keeping both an internal and external config.h is to reduce the amount of m4
work. The internal ones can just stay HAVE_ while we have to do the m4 work to
get the external ones to use LLVM_HAS_*
- Reid
>>Ah ok. How do you deal with the problem?
> I generally just pick an implementation (usually gnu) and if the platform
> doesn't support it, I port it myself.
Ok, but there is no standard, even among the GNU compilers. In fact we only
support the GNU compilers, but there are already 3 different combination of
#include file locations and namespace locations...
>> The problem is that there are multiple "kinds" of users of LLVM.
>Yes, that makes sense .. multiple audiences. It seems like there are three
>camps:
> - users (just want to write a compiler and use LLVM facilities)
> - extenders (want to extend LLVM framework .. new passes, etc. )
> - developers (changing/extending core LLVM abstractions)
>
> It _might_ be useful to reflect these three usage categories in the public
> APIs of LLVM.
That's a reasonable division of users. The problem is that we really don't know
where to draw hard lines in the sand to split up the headers, for example. I
don't really think there is a good way to do this.
> Just one point: the "simple to find" is handled by doxygen. Frankly, I would
> rather look at doxygen than raw header files.
It might just be a side-effect of LLVM only being half-way documented, but I
find the actual header files to be much more useful than the doxygen output.
There are a lot of comments and description that don't make it into the doxygen,
though we are incrementally converting most of it over.
Outside of LLVM, I generally don't find doxygen helpful for the exact same
reason that I don't like headers in source directories. For example, if you go
to this page (or one of its analogs):
http://llvm.cs.uiuc.edu/doxygen/annotated.html
... you get a list of every class in LLVM. This generally isn't helpful at all,
because it exposes WAY too much implementation detail. Generally most people
only want various levels of the stuff we expose in the include hierarchy. If
they get interested in a particular piece, then they can dig into the
implementation. That said, splitting header and source is obviously only a
partial implementation. Perhaps the ideal (though practically unattainable) way
to do this is to use some sort of literate programming methodology with
incremental levels of details exposed or something.
At some point, writing a book is probably the easiest way to go :)
> Furthermore, users are going to be more interested in the _installed_
> hierarchy. LLVM doesn't support install yet, but it should and it will
> come for free with automake. Since only the public headers would get
> copied on install, it accomplishes what you're attempting to achieve
> without separating the files.
We do have a prototype install target that Brian is working on, but I understand
your point. The problem with having automake rip them out, though, is that you
have to test both the 'local' and 'installed' configs seperately, and you have
to maintain the list of what to copy somewhere.
>> I don't really understand what you're saying here. What "standard"
>> implementation of hash_map are you talking about?
> I meant the std and gnu versions as "standard". That is, we should use code
> that's already written, not hack something up that is specific to LLVM.
Oh, yeah, that's exactly what I meant. I'm not interested in the NIH syndrome,
and also not interested in working on implementing/learning a new hash_map. ;)
> Oh, _anything_ is possible with autoconf, its just a matter of how much work
> you want to work for it.
Ok, I will leave it to you guru's to figure out how this should work. :)
-Chris
>>It seems like there are three
>>camps:
>> - users (just want to write a compiler and use LLVM facilities)
>> - extenders (want to extend LLVM framework .. new passes, etc. )
>> - developers (changing/extending core LLVM abstractions)
>>
>> It _might_ be useful to reflect these three usage categories in the public
>> APIs of LLVM.
> That's a reasonable division of users. The problem is that we really don't
> know where to draw hard lines in the sand to split up the headers, for
> example. I don't really think there is a good way to do this.
Understood.
> It might just be a side-effect of LLVM only being half-way documented, but I
> find the actual header files to be much more useful than the doxygen output.
> There are a lot of comments and description that don't make it into the
> doxygen, though we are incrementally converting most of it over.
>
> Outside of LLVM, I generally don't find doxygen helpful for the exact same
> reason that I don't like headers in source directories. For example, if
> you go to this page (or one of its analogs):
> http://llvm.cs.uiuc.edu/doxygen/annotated.html
>
> ... you get a list of every class in LLVM. This generally isn't helpful
> at all, because it exposes WAY too much implementation detail. Generally
> most people only want various levels of the stuff we expose in the include
> hierarchy. If they get interested in a particular piece, then they can
> dig into the implementation.
The doxygen configuration file can be set up to control exactly what gets
generated. One of the most useful things I find in doxygen is the ability to
look at the source code via a web browser _without_ any comments. Just pure
code. As for what gets exposed and what doesn't, that's completely up to the
LLVM implementors. Doxygen is just a tool and like any tool what you get out of
it depends on how you use it. Getting the right results requires a bit of time
and investment. Personally, I think doxygen is *wonderful* :)
As I go along and start to understand more about LLVM, I'll try to contribute
doxygen comments to describe things and submit the patches.
> That said, splitting header and source is obviously only a partial
> implementation. Perhaps the ideal (though practically unattainable) way
> to do this is to use some sort of literate programming methodology with
> incremental levels of details exposed or something.
> At some point, writing a book is probably the easiest way to go :)
Yup, I agree there.
> Furthermore, users are going to be more interested in the _installed_
> hierarchy. LLVM doesn't support install yet, but it should and it will
> come for free with automake. Since only the public headers would get
> copied on install, it accomplishes what you're attempting to achieve
> without separating the files.
> We do have a prototype install target that Brian is working on, but I
> understand your point. The problem with having automake rip them out, though,
> is that you have to test both the 'local' and 'installed' configs seperately,
> and you have to maintain the list of what to copy somewhere.
In my experience, very few things go wrong between a successful "make check" and
"make install". However, automake handles this too. Two built-in targets
provided by automake check both your distribution and your installation. The
"distcheck" target will build a source distribution (i.e. llvm-1.x.tar.gz),
unpack it, build it, and check it. The "installcheck" target will run all
automake tests but use the installed version of the programs and libraries
instead of the local version. But, I digress .. promised not to talk about
automake here :)
> The doxygen configuration file can be set up to control exactly what gets
> generated. One of the most useful things I find in doxygen is the ability to
> look at the source code via a web browser _without_ any comments. Just pure
> code. As for what gets exposed and what doesn't, that's completely up to the
> LLVM implementors. Doxygen is just a tool and like any tool what you get
> out of it depends on how you use it. Getting the right results requires a
> bit of time and investment. Personally, I think doxygen is *wonderful* :)
Ok, well since I don't really use doxygen much, it doesn't make sense for me to
be the one to configure it. :) If you have any suggestions on our config, I
would be happy to listen. The doxygen.cfg file is in ~/llvm/docs/doxygen.cfg.
> As I go along and start to understand more about LLVM, I'll try to contribute
> doxygen comments to describe things and submit the patches.
Cool. When I said "half way documented" before, I meant "half way doxygened".
:) While the documentation in LLVM could certainly be improved, I also don't
think it's completely terrible. I should probably make some time to write some
more "high-level" documentation, but that effort got stalled by the "convert
everything to docbook" push. :)
-Chris
I think that the only thing holding this bug up is the fact that
Support/DataTypes.h, Support/hash_map, Support/hash_set, and Support/iterator
#include config.h directly.
Brian mentioned that the right way to fix this was to autoconf these files
directly, instead of having them use config.h
Whenever he (or someone else) wants to do this, we can finally close this. :)
-Chris
I think this is taken care of now (finally!)
There were quite a few patches,
starting with:
http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20040223/012085.html
ending with:
http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20040223/012117.html
(not all of those are pertinent, but they are all in that range.)