donaldsonjw / bigloo

Bigloo Scheme Compiler
Other
12 stars 0 forks source link

Wasn't this supposed to be a version that could be compiled with Msys2 Mingw-W64? #1

Closed GordonBGood closed 8 years ago

GordonBGood commented 9 years ago

Since the master branch is supposed to be a merged trunk of the LLP64 and Win64 branches, one might assume that this version would be able to be compiled by Msys2 with Mingw-W64-x86_64 but that is not the case:

Although there has obviously been some work done to adapt Bigloo to Windows such as the _BGL_WIN32_VER define used to discriminate between Posix socket.h and Windows Winsock.h usage, even that isn't working with Mingw as the bigloo.h file has an reference to socket.h (and others) guarded by the above define that isn't defined by all files that might include bigloo.h.

In addition, new bugs have been introduced such as the "sa_family_t" type used in bigloo.h and not defined anywhere - it is almost as if this version of bitloo.h wasn't the intended version for this package.

Further, none of the LLP64 adaptations mentioned in your post as of: http://comments.gmane.org/gmane.lisp.scheme.bigloo/6214 made it into this version, including the BGL_LONG_T and BGL_LONG_T types and the other macro and program fixes mentioned there, nor are there patch files provided as for the version mentioned there (although the files downloaded from the follow-on post's links are corrupted).

It seems that this git is nothing but the same version as provided on the Bigloo website as of 6 January 2015, with no changes made to it whatsoever; perhaps you should modify the README file to reflect exactly that, and perhaps post to the ChangeLog that these git's have only been copied over with no changes yet made so those that find this git don't waste time trying to compile it with MSys2?

donaldsonjw commented 9 years ago

I apologize for the confusion. As you point out, master is indeed a clone of the official bigloo release. I am using it as the base for my development on the LLP64 and WIN64 branches. The LLP64 branch contains the changes directed at just addressing pointer and long size issues. Whereas, the WIN64 branch includes those changes plus windows-specific changes. So in short, you want to look at the WIN64 branch. It has all of the changes I mentioned in my email to the bigloo mailing list.

To build the WIN64 branch, you will need a working bigloo compiler to bootstrap. I will write the steps I am using and post them to github later today.

Again, I apologize for the confusion.

GordonBGood commented 9 years ago

Joe, well master isn't exactly a clone of the Bigloo 4.2a alpha release; note that bigloo.h also contains the sa_family_t references so it won't compile. I have a working Bigloo compiler under windows as a (of course 32-bit) cygwin compiled version from 3.1a that someone on the Internet created. Will that work as a bootstrap compiler?

I will download the Win64 source files and try it when you post the steps required later today.

Thanks for your prompt response.

On Wed, Feb 25, 2015 at 11:33 PM, Joseph Donaldson <notifications@github.com

wrote:

I apologize for the confusion. As you point out, master is indeed a clone of the official bigloo release. I am using it as the base for my development on the LLP64 and WIN64 branches. The LLP64 branch contains the changes directed at just addressing pointer and long size issues. Whereas, the WIN64 branch includes those changes plus windows-specific changes. So in short, you want to look at the WIN64 branch. It has all of the changes I mentioned in my email to the bigloo mailing list.

To build the WIN64 branch, you will need a working bigloo compiler to bootstrap. I will write the steps I am using and post them to github later today.

Again, I apologize for the confusion.

— Reply to this email directly or view it on GitHub https://github.com/donaldsonjw/bigloo/issues/1#issuecomment-75993778.

donaldsonjw commented 9 years ago

Hello, GordonBGood,

Actually, master is a clone of Bigloo 4.2a (although not completely up to date, I have not merged the latest alphas). If you download the bigloo4.2a archives from ftp://ftp-sop.inria.fr/indes/fp/Bigloo, you will find the sa_family_t reference in the original sources. It was added in Bigloo 4.2a.

The bootstrap instructions I promised can be found at https://github.com/donaldsonjw/bigloo/wiki/Mingw64-Bigloo-Bootstrapping. I have not used a windows version of bigloo for the initial bootstrap (I have been using linux) but in theory it should work. The process is not entirely straight forward and is likely more daunting than most will want to deal with. My plan is to put together downloadable archives that one can easily unarchive, configure, and make. I will let you know when I have something to share.

By the way, I realized while putting together the bootstrapping instructions that I had not yet committed a couple of changes. So if you have already cloned the WIN64 branch, you will want to pull to get those changes.

Let me know how it goes. I will try to help as much as I can.

GordonBGood commented 9 years ago

Joe, thanks for the update on the procedure; I see that you've corrected the two bugs I mentioned in opening the issue: that not all ways of including bigloo.h were guarded from the sys/socket.h stuff and defining sa_family_t in a presumably windows friendly way (as typedef'ed in winsock.h) as an unsigned short rather than the Posix way as an unsigned int. I would have/have done both of these in a more concise way as follows, since the "_BGL_WIN32_VER" in the bigloo.h file was only used this one place and in order to not redefine it in the case of this file being included from those '.c' files where it is already defined:

#if (defined( _MSC_VER) || defined( _MINGW_VER ))
#  include <winsock2.h>
#  include <mswsock.h>
#  include <ws2tcpip.h>
  typedef u_short sa_family_t;
#else
#  include <sys/socket.h>
#  include <netinet/in.h>
#endif

That's one of the things about not testing on the target machine, gcc would still find the Linux version of socket.h if it is on your include path and these things would still be defined although incorrectly.

For the 'make bigboot' stage, could one avoid the "BGLBUILDBINDDIR=<path to directory containing bigloo compiler executable>" define and use the "--build-bindir=" as a ./configure option instead?

For the setup of Msys2 (or ordinary Msys with Mingw=W64), one might also want Sqlite3 as mentioned in the install.mingw readme file in the distribution. Does winpthread work with this? If not, one might want to --disable-threads in ./configure?

For the second to last stage of ./configure for the final build, one will likely also want to disable phidget, which can likely be done with a --disable-phidget, but which I did by setting "phidget=no" on line 214 of the configure file, which works; I don't think we want to deal with installing, configuring, and making of the phidget package as it is dependent on usb-1.0 which may not be compatible with the hardware layer of windows. Most of us probably don't use Phidget's anyway.

I am almost there and would likely be able to report success if my machine didn't keep crashing for large jobs such as this, likely due to some bad RAM.

You didn't answer on whether you think a cygwin (32) version of Bigloo 3.1a will be able to act as bootstrap compiler? My attempts to do this before your instructions came through indicates that it can, with the main problem I couldn't seem to circumvent without modifying source files was that it didn't like integer literals defined as types such as "#u8:0", but maybe it won't have to with your correct bootstrap procedure?

Regards, Gordon

On Thu, Feb 26, 2015 at 12:47 PM, Joseph Donaldson <notifications@github.com

wrote:

Closed #1 https://github.com/donaldsonjw/bigloo/issues/1.

— Reply to this email directly or view it on GitHub https://github.com/donaldsonjw/bigloo/issues/1#event-242273620.

GordonBGood commented 9 years ago

I compiled a cygwin (32) version of the current release 4.1a-2 as a bootstrap compiler and the make bigboot still errors out just as an equivalent cygwin version of 3.1a did due to not recognizing the "#s8:0"/"#u8:0"..."#s64:0"/"#u64:0" typed integer literal syntax. Any help here? Why does a Linux version accept this whereas a Windows/cygwin version does not? I'm running out of options other than editing those out of the pertinent source files (type information shouldn't really be needed, as the identifiers to which these literals are being assigned/compared are already explicitly typed); It seems the cygwin/Mingw versions of the reader haven't been programmed to accept #u../#s.. as tokens whereas the other versions have, which is probably why you can bootstrap on Linux. The #u8(...) tokens seem to work for creating homogeneous vectors, but not as literal type designations. I'm speculating here, as this use of these tokens is not in the documentation.

Using the -build-bindir for ./config as mentioned in my last post does work as an alternate to setting BGLBUILDBINDDIR as an argument to make bigboot.

Other things that need to be turned off for a cygwin build is --jvm-no

GordonBGood commented 9 years ago

As I thought that the problem was to do with cygwin or mingw, I installed a Linux (Fedora 21 x86_64) as a Virtual Machine, installed Bigloo 4.1a from the repository as a bootstrapping compiler, and followed your instructions, disabling phidget and java as described above, with the result: it still hangs on those literals on the "make bigboot" stage, with the first problem at line 446 of file "Llib/srfi4.scm" containing #s8:0 where it chokes on the "#s" as an illegal token. How is it that you are able to get past this and I cannot? At this stage the only solution I can see is to remove those literal type forms, which as I said above don't appear to be necessary. There must be something simple in the compiling environment you have forgotten to tell me that will change those into something that the bigloo compiler can handle, but at this stage I can't guess what it is.

donaldsonjw commented 9 years ago

Hello, Gordon,

I see the problem now. The fixed sized integer literals causing you problems were introduced with 4.2a. I lucked out by using a linux version of 4.2a to do my bootstrapping; this is why I didn't see the signed literal issue. You can either try bootstrapping with 4.2a on linux, or I can put together a pre-bootstrapped archive for you to use; that would likely be sometime this weekend.

GordonBGood commented 9 years ago

Hello Joe,

Thanks for that, I was starting to think about bugs in regexp that were possibly different between using libpcre or not although it wouldn't make sense to have these defined literal types and then regexp them out in order to compile. Ah, so Bigloo4.2a slightly extends the explicit type annotations to include literal definitions...

Now that I know what the problem is, I should be able to use my Fedora/Linux VM to build Bigloo4.2a as a bootstrap compiler, then use it to bootstrap the WIN64 version to transfer to complete under MinGW64/Msys on Windows 64 (using Windows 7 64-bit). I'll let you know how it goes.

This issue isn't really closed until this works, is it?

On Fri, Feb 27, 2015 at 11:01 PM, Joseph Donaldson <notifications@github.com

wrote:

Hello, Gordon,

I see the problem now. The fixed sized integer literals causing you problems were introduced with 4.2a. I lucked out by using a linux version of 4.2a to do my bootstrapping; this is why I didn't see the signed literal issue. You can either try bootstrapping with 4.2a on linux, or I can put together a pre-bootstrapped archive for you to use; that would likely be sometime this weekend.

— Reply to this email directly or view it on GitHub https://github.com/donaldsonjw/bigloo/issues/1#issuecomment-76418042.

donaldsonjw commented 9 years ago

You are right. Reopened.

GordonBGood commented 9 years ago

Joe, think I almost got there and am stymied at the last "make" at the main bigloo source directory just before the 'make dist-bigloo.exe CSRCSUFFIX="%.c"' in the comptime directory, with which this latter I don't anticipate any problems. The problem is that the dtrace available with the fedora 21 repository is too old to support the "-xnolibs" option and downloading and trying to compile/install the latest version of dtrace results in a compilation error - type mismatch. I'm too tired to take on another project of finding what that is.

So I will have to await you with your pre-bootstrapped archive file... Or...?

It should be noted here that this is a circular dependency: One can't compile a workable bootstrap compiler on Windows even given all the compatible tools including this dtrace because one needs a version 4.2a compiler in order to get arount this #u..: literal definition problem, which version 4.2a can't be compiled under Windows because this new WIN64 version is needed to be compiled in order to .... etcetera. The only possible way around this (which Manuel Serrano must have done at some stage) is to temporarily edit out the new literal specifications from those files where they are used just to get a bootstrap bigloo4.2a compiler, then put them back in/keep them for the bootstrap build which should now pass. This last stage would still require a working new version of dtrace under MinGW, which I will now check for before going any further.

Meanwhile, we should really update the INSTALL file and the README.mingw as to the exact tools required to do an ordinary non-bootstrap build as most normal users might have to do; the following applies to the 64 bit version of Fedora 21, but should apply to other Linux distros:

The exact tools required are:

  1. make
  2. gcc (which depends on binutils at least on some distros)
  3. diffutils (used in the process for comparing files)
  4. patch and maybe patchutils (used to patch files revealed by the above diff's)
  5. libtool with its dependent autoconf and automake (used to compile libuv)

The exact mostly optional library files required as the '-devel' versions to make the different api modules work are as follows:

  1. unistring used for Unicode support
  2. gmp, used for arbitrary precision arithmetic (bignum's)
  3. openssl, used for sockets
  4. pcre, used for faster regular expressions (there is a fallback, but written in bigloo and will be slower)
  5. sqlite3, for database work 6 pthread for native thread support.
  6. alsa, for sound
  7. avahi, for audio/visual
  8. flac, as a lossless audio codec
  9. gstreamer and gstreamer-plugins-base, for media handling comonents
  10. mgp123, for implementation of mpeg layers 1, 2, and 3. This last may not be available in many distros, but the source tarball can be downloaded and compiled reasonably easily without many dependencies. This should be compiled and installed with procedure 2 from the INSTALL file (the developer build) in order to get the '-devel' files installed, and the final stages must be done as sudo or su to root in order to install into the system.
  11. phidget (which depends on libphidget), used to provide an interface to hardware Phidget's - USB connected devices available from the Phidget company. The source is easily available, but this may not work for many distro's including MinGW as it depends on the USB hardware layer. It isn't really necessary unless one owns and uses some Phidgets. Note that if it is compiled successfully, it must be installed with sudu/su root privileges so the "configure" process picks it up (else additional directories where it's installed must be specified to "configure"). IT MUST BE DISABLED MANUALLY IF THESE FILES ARE NOT PROVIDED BY SETTING "phidget=no" (default has nothing after the equals sign) in the configure file at the root of the source distribution at about line 200 to 214 (changes depending on the version), as the "configure --disable-phidget" is not wired back to disable this automatically..
  12. java and java-devel if one wants to support the Java JVM compiler options (which probably aren't very stable). If these are not used, then one must set "--jvm=no" as an option after the "--bootconfig" option to config for the start of the process in building the bootstrap else it will crash when these aren't found.

So that is enough for the bootstrap compiler build.

So far the only extra dependency found for the actual bootstrap build is this "dtrace" tool dependency, and this is not too far from the end of the final large "make".

This has certainly been a good exercise of my new RAM system, which seems to be passing with flying colours as no computer crashes or hangs as of yet on these quite large jobs.

If I get time, I'll try a MinGW32 build of bigloo4.2a with the known patches applied to the '.c' and '.h' files as well as the #u literal patches in the '.scm/.sch' files, but only if I can build a working MinGW compatible dtrace tool.

donaldsonjw commented 9 years ago

Hello, GordonBGood,

Please try the pre-bootstrapped archive at https://sandjsite.net/jwd/files/bigloo4.2a.tar.gz. I successfully compiled it under 64 bit Windows 7 with MSYS2 using the Mingw64 win64 gcc compiler. Once built, I ran make test and it passed all of the recette tests. I configured it as follows: ./configure --disable-libuv --disable-wav. I did not have to explicitly disable the building of the jvm back-end or the phidgets library due to java and phidgets being absent from my windows machine, but given your earlier correspondence, you may need to. Let me know how it goes.

I agree that the documentation needs to be updated and expanded. You are the first person to show serious interest, so I have had little impetus to do so until now. I will put some effort into this front.

Thanks, Joe

GordonBGood commented 9 years ago

Joe, thank you very much, your archive worked although I was able to generate one myself under the Fedora Linux VM by working with libuv as explained below. Anyway, with a working Bigloo4.2a boot compiler, I can now experiment trying to enable various API's under Windows on my own and can in fact generate my own bootstraps from Windows.

What's this about Control-C being disabled for the interpreter (some sort of sig issue) and how do we fix it? It isn't very convenient to have Control-C dump the interpreter, although one learns that the Esc key will clear the interpreter command line. The interpreter isn't as easy to use as some anyway as there is no debugger - one needs to use the Bee for symbolic debugging, which means another extensive setup operation for EMacs under cygwin. OTOH, Gambit-C doesn't have a working IDE at all.

In my experience as in my last post, JVM will have to be disabled (--jvm=no) on any configure state where this isn't the default (primarily the bootconfig stage when rebuilding the boot) where there is some form of JDK to be found; the documentation README.jvm is correct that it is hard to find a compatible JDK - it choked on openJDK-1.8 and didn't seem to like the Oracle version of the same either. Rather than investigate this, it was easier to just disable JVM, as I don't plan to use the JVM backend anytime soon. For non-bootconfig uses as in once one moves the bootstrap archive to the Windows MinGW environment, this won't be necessary as non-bootstrap configure has JVM disabled by default.

On getting the bootstrap to finish on my Fedora Linux VM - the problem with dtrace was restricted to the compilation of libuv which could be fixed in one of two ways: by not compiling libuv using the --disable-libuv at the "./configure" stage as you suggest is currently necessary for Windows MinGW anyway, or by editing the libuv Makefile at about line 4016 to take out the "-xnolibs" dtrace flag, which is the only place it is used and which probably won't make any difference anyway, this then allows the Fedora version of dtrace to run. I did get it to make/compile using the latter method but will try your provided bootstrap archive first.

From the above, I can see that it is likely going to be a big job to get libuv to compile under MinGW on Windows: although MinGW has the libtool, there is no dtrace in any form, and that is likely going to be hard to replace as its use seems to be buried in the libuv make file. Some patching is going to be necessary there after investigation.

As to the libphidget use, so far I have had repeated Error's with the lack of the phidget library under MinGW - although it seems that this error is ignored and it will install anyway; perhaps setting 'phidget=no' in the configure file just avoids the autoconfig probe and the Error message.

I've also compiled with a straight Mingw64/Msys system as well as the Msys2 build environment just to see if it is possible and to see what the relative procedures are for documentation purposes, and so on. This works but there are some of the auxiliary packages such as openssl, pcre, and sqlite3 that aren't as easy to find and install (not available under mwing-get) whereas they are easy under Msys2. My only problem with Msys2 is that my local mirrors aren't very reliable for some of the files and I have to download them manually from the central repository then move them into the package cache, which some users won't like to do. Also I have found that some of the obscure files in some of the packages (ie docbook-xml/xsl in base-devel) can't be easily found anywhere and thus the entire package can't be installed; however the essential parts can be installed piece by piece. The mingw-get process works much smoother in that regard; it just lacks some of the files.

The files required for Msys(2) are the same as for the Linux build, namely msys/make, mingwxx/gcc, msys/difftools, msys/patch and msys/patchtools (this last maybe, not sure), and msys/libtool. For the optional API's one will likely at least want the mingw-w64-x86_64 files for both the library and the '-devel' packages for the following: unistring for Unicode support, gmp for multi-precision arithmetic (bignum's), pcre for faster regular expressions, openssl for socket support, sqlite3 for database, and winpthreads for native thread support. If trying to use a regular Msys environment, some of these such as openssl, pcre, and sqlite3 will have to be built from downloaded tarballs just as the libphidget would have to be if one were going to attempt to use it.

Yes, I don't suppose you get all that much interest in your project, as Scheme isn't so popular, and the Bigloo dialect likely even less so, perhaps in part due to that up to now the Bigloo dialect hasn't been so easily used other than by Linux users for whom installation is easy from their distro's repo's. I started learning Scheme a couple of months ago after experimenting with functional programming for a few years which started the search for a good fast Scheme implementation, and have now looked at Petit Chez Scheme (standard R6RS mostly but slow due to interpreter), DrRacket's version of R6RS (JIT compiles to faster than Petite Chez but still about seven times slower than Cee speed for tight loops even using unsafe ops), Chicken compiled to 64-bit (which was relatively easy under Ming64, but Chicken isn't so good as to consistent speed for calls and continuations due to Cheney on the MTA), Gambit-C (excellent, but the compiler doesn't optimize as well as Bigloo so tight loops are about twice as slow, no native thread support, but very much R5RS conformant, 64 bit versions readily available, compilation process under MinGW64 easy, but no modules), and now finally Bigloo which up to now I found relatively unapproachable due to lack of Windows and MingGW64 support. I find that it perhaps does not adhere to the R5RS and general Scheme standards as well as Gambit-C (tail call optimizations/continuations?/callcc? and others - numeric tower) but close enough and perhaps the speed is worth it, although at the cost of compilation time as compared to Gambit. Anyway, now that I have a 64 bit Windows version, I can better compare. I already have found that tight loops compile to just as fast as Cee when compiled with full optimizations and unsafe mode, which is understandable when one looks at the generated Cee code in that Bigloo code doesn't get in the way of Cee optimization and even pulls the dynamic type tagging and untagging operations out of the loops.

Perhaps once Bigloo is easier to try on Windows, more people will have some interest, but I don't think Scheme is ever going to be really mainstream, although the resurgence of interest in functional programming (F#, Haskell, adding functional abilities to other languages such as C#, Scala, even Python and Golang - this last very little, only a poor implementation of lambda closures) may make Scheme more interesting to learn and apply. At least one isn't continuously fighting the non-strict system of Haskell when one wants speed. Of all the Scheme implementations of 10 years ago, these five (Chez/DrRacket/Bigloo/Gambit-C/Chicken) appear to be the only ones that currently survive as in actively being developed.

One problem with Scheme is reluctance/slowness to accept the new standards as in R6RS/R7RS (I know, big and unwieldy) so code isn't so portable unless written carefully to be that way from the outset (choosing a subset of R6RS that can be easily macro'ed to ones chosen dialect), and syntax that isn't so readable as some (ie. Haskell/F#/Python whitespace defined syntax versus matching brackets); as well that passing multiple values isn't as automated (no auto-tuple as in Haskell/F#/others). Scheme is amazing considering how old its roots are, but syntactically has been passed by more modern functional languages that better adhere to standards.

The other problem with Bigloo is that it is very much a one man show (ok, maybe two or three including you and me), where even Gambit-C has a more active user community

BTW, when do you think Manuel Serrano will be able to merge these changes into the main master trunk for 4.2a? Once 4.2a is stable, there needs to be a new WinBigloo or Bigloo for Windows site made or a installer section in the main site (or a link to your site) so potential users don't have to go through these hoops. Even installing Msys2 or a Ming/Msys system will be daunting for many potential users.

I think this issue can now be closed other than if you put a link to some pre-built Windows installers somewhere such as winbigloo used to do for Ming32 (hasn't been updated since Bigloo3.6a in 2011) or the yaya website (hasn't been updated since Bigloo2.8c in 2006). The winbigloo used to include an integrated MinGW32 gcc compiler and the yaya one even included an emacs version integrated into the Bee. Packages such as this that would let users just install-and-go would definitely make it much easier for Windows users to try and fully exercise Bigloo, and in fact would make it superior in that way to Gambit-C, which doesn't have a working IDE.

GordonBGood commented 9 years ago

Any help here; Bigloo4.2a has this extended type checking which means some code is not backward compatible or can type checking be turned off. Previously one could assign the contents of a homogeneous u8vector to a normal fx integer variable and it would just extend it automatically and add the tag bigs; now it errors out complaining about setting a uint8 to a long. Is there a way to turn off type checking so previous behaviour is observed and where is the documentation on type casting?

GordonBGood commented 9 years ago

OK, got it except for whether type checking can be turned off so the backward compatible behaviour returns: there is a whole set of new type conversion macros such as (uint8->fixnum #u8:0) which will return a fixnumf from a uint8.

GordonBGood commented 9 years ago

Answered my own question on signals and why Control-C doesn't work: Windows has very poor support for signals, so there are no library functions to call that will emulate these UNIX functions.

However, winpthreads seem to do a very good job on emulating all we would need for pthreads, so since both Msys2_32 and Msys2_64 support it, I think we should look into adding native thread support under MinGW which is currently turned off. I will be testing this in the next few days, but it should be a separate issue...

As to the new strong type checking that may not be able to be turned off, that likely isn't as much of a problem as I first thought, as one can still write regular Scheme code using a careful subset of R6RS syntax and convert it to the new strong type checking Bigloo syntax using macros. However, it seems that with this version, Manuel Serrano is taking Bigloo even further away from standard Scheme, even further that standard Racket departs from it and more into Typed Racket territory. Not that I don't kind of like it; never have quite taken to dynamic typing in a compiled language...

donaldsonjw commented 9 years ago

Hello, Gordon,

I am glad you were finally successful in building Bigloo on Win64.

As to making it easier for windows users, I had been thinking that adding a Bigloo package to MSYS2 might be a very good step in addressing ease of access and use, given that It is the easiest way currently for obtaining Bigloo's dependencies on windows. However, your comment on the lack of good package repository mirrors does give me pause. I have not experienced such issues, but if it is common, it would definitely be a problem. I will have to do some more investigation. If MSYS2 proves to be less than ideal, it would be possible to put together a comprehensive installer that included bigloo, gcc, and all of the dependent libraries, just a lot more work.

I agree the ctrl-c behavior on windows is annoying. I believe it is possible to get more reasonable behavior, but we will need to use platform-specific functionality instead of the poor signal support currently provided by windows. In that light, there are a number of areas such as networking, memory mapped i/o, etc.. that could be greatly improved if done with windows native apis. They have been on my list to investigate for quite awhile, but since I have not specifically needed them, they have not moved to the front of the queue.

Prior versions of Bigloo on windows have supported pthreads. In more recent versions, it was disabled due to changes in the Boehms gc that broke the build. It has been awhile since I have looked at the Boehms gc to see if newer versions now work correctly. If they do, only relatively minor changes should be needed to get pthread support working again. As a slight aside, the fact that windows requires all symbols to be defined at compilation time means that you either have separate threaded and non-threaded versions of the standard libraries and bigloo interpreter/compiler, or you build everything to be thread aware. Previously, we went the route that everything was thread-aware. This does introduce a bit of overhead for non-multithreaded applications, so if effort is going to be put into reenabling thread support, this decision may want to be revisited.

Bigloo has extended (or diverged from) scheme fairly significantly over time. All of the changes have been aligned with its goal of being a scheme (and lisp) like language which you could use for tasks where C or C++ have traditionally been used; I like it for that very reason. But it does surprise some people when they are expecting a more vanilla scheme. It may be time that Bigloo does what Racket did and declare itself an altogether new language heavily inspired/influenced by scheme. Heck, it may get more people to take a look.

I have not heard any word from Manuel on whether or not he is going to apply the Win64 patches. The patches are pretty large and touch a significant portion of the Bigloo source tree, so I can understand some caution on his part.

GordonBGood commented 9 years ago

Hello Joe,

The Msys2 route may be the best one for distribution, and all one would need would be a Msys2 and Bigloo installation instruction sheet on what to do if one has trouble downloading specific packages from mirrors including links to the repos where those files can be found and how to move them into the pacman cache. Since such users won't have to support building bigloo themselves, a minimal set of packages would be required, with gcc the main one but also any of the API's that were required such as gstreamer, flac, mpg123 (minor ones) but also winpthreads (if we go that route for native threading support), gmp, pcre, openssl, sqlite3, and unistring if any of those API's were desired. One could also include some hookups to a Windows/MinGW version of emacs and potential users would be all set up including IDE support. These could all be linked in as dependencies for the Bigloo package so little intervention would be needed as long as the mirrors worked. Other than if we were to build a complete Windows installer, I think Msys2 would be the best option in spite of my hiccups with mirrors as it does provide all the pieces.

Yes, I see that fixing the Interpreter Control-C issue and the others you mention would require calling native Windows API's. It actually isn't too bad once one gets used to avoiding Control-C while in the REPL and the Esc key works well for clearing the current line. As REPL's go, the Bigloo one isn't the best anyway as I mentioned before, so I wouldn't spend too much time on it.

Yes, in reading the configure/autoconfig files I see that native threading used to be supported under MinGW and has been completely disabled. I think that if we restrict our implementation to MinGW-w64/Msys2 then we should convert to using winpthread, which seems to be specifically designed to look just like the Linux pthread version; I would think that the GC would be fine with that, and the autoconfig process can detect whether this is present by checking for the libwinpthreads library which would mean the system is under MinGW-W64. It is a very easy thing to check to see if it will work other than for all the GC_WIN32_THREADS and GC_WIN_PTHREADS defines and conditional compilations as compared to GC_LINUX_THREADS and whether NO_HANDLE_FORK and THREAD_LOCAL_ALLOC can/need to be used. There are a fair number of files to scan to see how these interact. Once this is determined, I don't think that the GC will need to know that it is on MinGW and not Linux.

As to whether to have separate threaded and non-threaded versions for Windows, I see that Chez Scheme goes this route in order to not have the threading overhead when it isn't required. Once we get native threads working again, I guess we'll have to do some tests to quantify the percentage cost of the extra threading overhead and make a decision based on that.

Although I am in no way a Scheme purist, I was at first a little bothered that Bigloo is departing even further from standard Scheme, but then I realized that I actually like having a static type checking and richer type system. I guess the things that I don't like about Bigloo are buried so deeply in the design that they will never change: this reliance on using Cee type functions for procedures along with the Cee stack and the twists and turns Manuel has had to take to get everything to generate fast code. The Bigloo compiler can generate code that compiles to executables that are as fast or faster than if written in Cee in the first place, but it takes a lot of compiler passes and compilation is a bit slow; he had to go to great lengths to implement call/cc and the result isn't very satisfactory in that it is dog slow (nested call/cc calls at about 1000 per second as compared to millions for most other implementations such as Gambit-C, perhaps this is because the Boehm GC is slow at allocating for the saved continuations or perhaps these continuations consist of a lot of machine state, but it is terribly slow - haven't tried his stop-gap in "bind-exit" yet, which may be faster; interestingly, the Petite Chez Scheme interpreter can run about a hundred million nested call/cc's a second, likely because the compiler is smart enough to turn them into a loop); tail call optimization problems when the code can't be turned into simple loops due to trying to build it on top of conventional Cee functions, etc... However, in thinking about it, one doesn't need these features all that often, most tail calls are just loops as Bigloo readily recognizes, and he has provided reasonable workarounds for most of the rest, so it is likely alright. Gambit's method of implementation not using Cee functions at all but rather using "structured goto's and switch's" is very clever and avoids the overhead of dealing with Cee functions, but unfortunately Marc Feeley doesn't have the compiler optimizations nor has he added modules and native threads, and his IDE isn't currently working.

I'll continue to work on getting native threads to work again on Bigloo, and then see where we are after a few more tests. I suppose that the knowledge I am gaining could be carried back to take care of the main deficiency with Gambit - lack of native threads. Marc Feeley would like someone to work on a type versioning system that would make compiler optimizations easy and should make performance come close to that of Bigloo as to tight loops, and I might take that on. Modules are nice, but not really necessary for my limited use of Scheme. Gambit-C is a much smaller system and generally easier to work with as it is truly after the Scheme KISS mindset, which I also like, while at the same time tries to stick to the R5RS standard where possible. I considered giving Gambit native threads support by just having a hardware layer that works behind the scenes to make his "green" (I guess something like the Bigloo "fair thread" model) threads run on multiprocessors, which is something as the Haskell threading system does.

It seems that all Schemes that try to base on the R5RS standard need extensions just to be able to do useful work, with even R6RS implementations needing a few extensions that were missed in the standard; none of the standards to date cover the use of threads, so there are always going to be differences there. However, with these last major static type checking enforcements that can't seemingly be turned off (at least for homogeneous vectors, which of course are outside any standard other than SRFI 4 anyway), it would seem that Bigloo is getting closer and closer to needing to declare itself a separate dialect that is built on Scheme just as Racket is and just as Typed Racket is separate from the main Racket. However, one doesn't want to depart too far from standards to gain speed, or perhaps risk going the way of the now defunct Stalin compiler.

So what do you do if Manuel doesn't merge WIN64 into the master trunk? Do you just continue to merge his new changes to the master trunk back into Bigloo-WIN64 indefinitely? Can you do that fairly easily as in once we get a stable set of changes to support WIN64, you can easily apply any new changes he makes in his master trunk to your own GIT?

GordonBGood commented 9 years ago

Hello Joe,

Here is a summary to date, as follows:

A/ First some cleanup items from the existing code:

  1. Related to LLP^$, on make there are some Warning messages that flash by as to BGL_LONG_T and BGL_ULONG_T being assigned unsigned when signed is expected and vice versa; I didn't chase those as you will know better where they are than I.
  2. On WIN64, I really encourage you to change the socket configuration part of bigloo.h to something like the following code in order to avoid the macro redefined warning messages:

    if (defined( _MSC_VER) || defined( _MINGW_VER ))

    include

    include

    include

    typedef u_short sa_family_t;

    else

    include <sys/socket.h>

    include <netinet/in.h>

    endif

  3. There is an old undiscovered bug in the nbproc probing file of the autoconf directory where on line 69 it should say _SC_NPROC_ONLN instead of _SC_NPROCESSORS_ONLN for the sysconf() call, since if it gets there then _SC_NPROCESSORS_ONLN will not be defined. This bug hasn't been discovered because most systems will define _SC_NPROCESSORS_ONLN or neither, but it should be corrected.
  4. There is a problem with "make clean" when the multi-threaded version of the gc is not configured (when there is no thread support) as the clean process tries to clean the non-existing or un-configured "-fth" (fair thread?) version of the gc that isn't in a state to be cleaned; "make clean" works fine when there is thread support as then this configuration is automatically done. This is also a problem whenever there isn't something completed that the "clean' process expects, such as the libuv that we have disabled due to not yet being compatible with WIN64.

B/ Regarding getting native threads working though the winpthreads library, as follows:

  1. Line 1566 of the main configure file needs to be changes from "threadsupport=no" to "threadsupport=" so that configure autoconf will look for thread support.
  2. Lines 1766 to 1776 of the main configure file need to be commented out or removed so that they don't force the native threads configuration variables to a disabled state for ming installations.
  3. The following lines need to replace or be added (with the previous setting commented out) at about lines 43 and 44 of the pthread probe file in the autoconf directory so that the winpthread library is used instead of some other pthread version for mingw configuration.

    This is the mingw-w64 pthread implementation

    pthreadlibs="-lwinpthread";;

The above changes allow configure to proceed and determine that native threads are supported, reporting the correct number of processors and that all the required functions are available and working other than the timedfork one, which according to the documentation and use in the files isn't really necessary (an option that will but autoexpanded if it is available or not).

You were correct that most of the problems getting pthreads to work are related to the Boehm GC and I still haven't gotten a make completed, fighting settings in the GC or at least the _fth (fair threaded?) version. The source for this library is quite a piece of Cee Macro Shite with all the hack patches that have been done to get it working on different platforms. My idea was to just tell it that it was on a Linux platform and add some mingw checks to make it conform for any differences (check for the MINGW32 macro) as the other macros are quite a mess, but there are a few differences due to the lack of signals on Windows and that therefore the "sigmask_t" type doesn't get defined. It doesn't help that there are some gratuitous references to this type in the code that never get used.

  1. One such is at line 159 of the bglpthread.c file in the api/pthread/src/Posix directory, which needs to be removed or commented out.
  2. I have changed/patched line 2664 of the gcconfig.h file in the gc_versionxxx/include/private directory to "#if defined(GC_LINUX_THREADS) && !defined(LINUX) && !defined(NACL) && !defined(MINGW32) " so as to avoid the "inconsistent configuration" message. I know that we would rather not do this as in patching externally provided code, but I am trying to use currently defined macros to get the job done and avoid defining new ones. It may be that the MINGW32 macro will have to be used in other gc files in order to get these last dangling references fixed.
  3. In order to "make" this, Msys2 needs the installation of ldfcn which is normally supplied by Linux.
  4. I have been experimenting with the setting in the autoconfig "thread" probe file to see what cflag's need to be set and am currently using the following, which is just like the settings for a Linux platform with the GC_NO_PTHREAD_SIGMASK, and GC_PTHREADS macros added (I added the GC_PTHREAD macro in order to get the stack size references to work), but I am still tracing down why the functions GC_pthread_create(), GC_pthread_join(), and likely GC_pthread_detach() (this last is perhaps not used) are left as dangling references at the link stage with these functions not defined:

    echo "-DGC_LINUX_THREADS -D_REENTRANT -DGC_THREADS -DNO_HANDLE_FORK \ -DTHREAD_LOCAL_ALLOC -DGC_PTHREADS -DGC_NO_PTHREAD_SIGMASK $cflags";;

I'm thinking that I may need to define one of the GC_USE_DLOPEN_WRAP or the GC_USE_LD_WRAP macros in order to complete the "make" but don't know why that is necessary under MinGW when it isn't under Linux.

Do you have any ideas?

GordonBGood commented 9 years ago

Joe, in the gc library there seem to be too many conflicting macros between using GC_LINUX_THREADS and MingGW (not that one couldn't get it working with enough research) so I will try going back to building using as it was before for Red Hat under MinGW in the "thread" probe file for autoconfig. However, there do seem to be some conflicting and maybe some missing macros that will have to be dug out. For instance, the GC_WIN32_PTHREADS and the GC_WIN32_THREADS would seem to be mutually exclusive, with the first telling the system to use Posix threads and the second telling it to use Windows API's, and unless picked up by the weird combination of these macros, some macros would appear to be needed such as GC_NO_PTHREAD_SIGMASK and maybe THREAD_LOCAL_ALLOC with depending on whether this gets handled by the strange combination. It's easy to see why this Boehm GC gets easily broken with such a complex interface system.

GordonBGood commented 9 years ago

Joe, it seems that MinGW support as to threads in the gc are severely broken and it would be better to get it working as I tried before using just straight POSIX/LINUX threads, as follows:

I believe I was almost there previously, the the problems being dangling references in the linker. Something I should have done before, but just read now in the gc doc/README.linux file is the following, which seems to be instructions on how to use the "FC_USE_LD_WRAP" macro to which I referred earlier:

"3b) A new alternative to (3a) is to build the collector and compile GC clients with -DGC_USE_LD_WRAP, and to link the final program with

(for ld) --wrap dlopen --wrap pthread_create \ --wrap pthread_join --wrap pthread_detach \ --wrap pthread_sigmask --wrap pthread_exit --wrap pthread_cancel

(for gcc) -Wl,--wrap -Wl,dlopen -Wl,--wrap -Wl,pthread_create \ -Wl,--wrap -Wl,pthread_join -Wl,--wrap -Wl,pthread_detach \ -Wl,--wrap -Wl,pthread_sigmask -Wl,--wrap -Wl,pthread_exit \ -Wl,--wrap -Wl,pthread_cancel

In any case, _REENTRANT should be defined during compilation."

It would seem then that the only thing I was missing was using this macro and then including the linker flags to use the generated wrapped links. Since Bigloo uses gcc directly to link, the second form using "-Wl --wrap"'s is likely the appropriate one, and the only question is where to put this into the cflags - in the "thread" autoconf probe file?

This takes care of the Bigloo 'make" but "make test" still fails for the 'thread-yield" test and maybe others

donaldsonjw commented 9 years ago

Gordon,

I will take a look at the warnings and code cleanup suggestions you provided. I don't believe they should be to difficult to address. Additionally, the bug in the nproc autoconf script seems simple enough, and we should be able to fix the make clean issue you reported.

As for winpthreads and Boehm's gc, I took a look at the boehm's gc github at https://github.com/ivmai/bdwgc/, and from the commit log, it appears that support for winpthreads was added in August of 2014. This support has not made it into a stable release-- the last stable release was in June 2014-- but it may be worth pulling the latest version to see if things are better.

I am busy at work this week, so I likely won't have any code updates for you until this weekend.

Best Regards, Joe

GordonBGood commented 9 years ago

Joe, I think the not passing the pthread tests is more a matter of an error in the test procedure than in pthread not working: Manuel never has any qualms about breaking standards in that the Bigloo (pthread library) version of "mutex-state" does not return the "abandoned/not-abandoned/not-owned" symbols or the owning thread but rather just "unlocked/locked" nor can one set the thread owner using "mutex-lock!", which only support two arguments not three for either library. In addition, the "pthread" version completely changes the behavior in that it when a repeated call is made to "mutex-lock!" it doesn't cause a block until timeout as SRFI-18 suggests but rather just treats the mutex as a semaphore with stacked locks which must be unlocked with an equivalent number of unlock operations - thus it can't be used to cause a wait on mutex and one will have to add a "cond" condition variable to implement a wait. This is not mentioned in the current Bigloo manual.

Further, SRFI-18 states that it is not an error to unlock an already locked mutex, yet even with the srfi18 library the program will crash when doing this; this may be related to that Bigloo is not putting a guard on the mutex unlock as the base pthread implementations are specified that this is forbidden and causes an error. The Bigloo pthread implementation (not srfi18) uses its "reference counting" to prevent this error.

Yet further, for either library, just as the "mutex-lock!" procedure does not accept a third parameter to set the thread owner, neither does the "mutex-unlock!" procedure take a condition variable parameter (plus optional timeout), meaning that one must use the non-SRFI-18 standard Bigloo procedure "condition-variable-wait!" to accomplish this even when using the srfi18 library.

This is another case where Bigloo is a language unto itself without much conformance, and is poorly documented as to what it does and does not conform to. That isn't to say that it isn't usable, but the documentation needs to be updated to whatever "standard" it conforms to, and that should not include any pretense that it conforms to SRFI-18.

The "thread-yield" test doesn't really work properly; it is just adapted from the SRFI-18 example which doesn't really work properly either: the intention is that when the mutex is unlocked, then lock it and yield, and loop, with the next loop finding the mutex already locked thus the lock fails after a timeout, so printing "locked", unlocking it and exiting. As written, it will lock it, return #t, print "locked", unlock the mutex and exit without ever running the "thread-yield" and not really testing it even if it did. A better test is that the sub-thread checks the mutex state for 'locked, initially finding it 'unlocked which causes a "mutex-lock", the 'thread-yield", and a loop back so the next time the mutex is found to be 'locked, causing the "locked" display, "mutex-unlock" and exit.

Note that the mutex lock and unlock functions should really be tested before they are used in testing "thread-start!"/"thread-start-joinable!" and this "thread-yield!". Obviously they work in this case else the tests wouldn't have proceeded as far as they did. In fact "thread-start!" should be tested simply before being used in a complex test that involves "thread-join!" which should be a subsequent stage. Thus pthread testing is quite seriously flawed.

However, the real reason that the testing is breaking at this point is some kind of conflict with the "with-output-to-string" procedure, which is crashing a stopping the whole series of tests. I don't really want to take the time to debug this, so will try just a few more things before stopping and waiting for you or Manuel.

GordonBGood commented 9 years ago

Hi Joe,

Sorry for the long meandering posts, but I really want to get Bigloo right. Bigloo WIN64 winpthreads is actually now passing "make" using the "wrap" technique from the Boehm README.linux file and running, but not passing some of the pthread tests, not so much due the winpthread linkage but more due to causing other Bigloo procedures to crash (with-output-to-string, perhaps others; perhaps some of these cross connect with the gc?). I will try some of the Linux macros that may help such as NO_HANDLE_FORK and THREAD_LOCAL_ALLOC just in case those have a bearing. If this does not work, perhaps pulling the latest non-stable gc release as you suggest may help.

As I noted in my last post, the pthread recette tests really need to be re-written to apply to Bigloo as Bigloo breaks the SRFI-18 standard in so many ways, and some of the SRFI-18 tests weren't much doing what they were intended anyway. My major concern right now is how the Bigloo version of "mutex-lock!" as used for the pthread library in that it actually just uses the created outer mutex as a semaphore controlling the actual pthread mutex object (which can't be called on on unlocked actual pthread mutex); however, that means that the behaviour of depending on calling a "mutex-lock!" on an already locked outer mutex blocking until something unlocks the (outer) mutex does not work and the call immediately returns #t no matter the timeout value (if any is given), just increasing the lock count on the semaphore. If any other the code depends on the SRFI-18 behaviour, there will be inconsistencies and crashes.

There also seems to be a bug in the srfi18 library for "mutex-lock!" in that a timeout value of 0 should mean that it always immediately returns with #t if the mutex was formally unlocked and with #f if it is already locked (causing a stall if no other thread ever unlocks it); the current behaviour is to treat the 0 as if no timeout argument was given and wait indefinitely for the mutex to be unlocked - a value of 1 causes the expected behaviour of returning #f for an already locked mutex, but after a one millisecond delay. I am not too concerned with this as even though the srfi18 library is provided, it isn't fully compatible with the SRFI-18 standard such that likely users should follow Manuel's recommendation to use the Bigloo pthread library (which he also says is much faster due to not doing all of the "mutex-state" condition handling.

We need to get pthread's working completely before we run some sort of benchmark trying to evaluate the performance cost of using pthreads compared to not using them - I see there is a basic test running a lot of threads in pthreads/recette2, but it shouldn't be that hard to come up with something; I think we are just generally looking at determining the cost in milliseconds of doing a context switch, which for many threading systems is in the order of a millisecond or so. For most native threading systems, one wants to keep thread work in increments of no less than about 10 milliseconds for efficiency even using thread pools to avoid the overhead of creating and starting new threads. Thread pools aren't supported by Bigloo directly, so this would need a library.

Best regards, Gordon

On Fri, Mar 6, 2015 at 10:27 AM, Joseph Donaldson notifications@github.com wrote:

Gordon,

I will take a look at the warnings and code cleanup suggestions you provided. I don't believe they should be to difficult to address. Additionally, the bug in the nproc autoconf script seems simple enough, and we should be able to fix the make clean issue you reported.

As for winpthreads and Boehm's gc, I took a look at the boehm's gc github at https://github.com/ivmai/bdwgc/, and from the commit log, it appears that support for winpthreads was added in August of 2014. This support has not made it into a stable release-- the last stable release was in June 2014-- but it may be worth pulling the latest version to see if things are better.

I am busy at work this week, so I likely won't have any code updates for you until this weekend.

Best Regards, Joe

— Reply to this email directly or view it on GitHub https://github.com/donaldsonjw/bigloo/issues/1#issuecomment-77500709.

donaldsonjw commented 9 years ago

Hello, Gordon,

This weekend has been productive. I believe I have addressed a number of the issues you identified. The WIN64 branch has been updated with fixes for the autoconfig program nprocs, the socket conditional logic in bigloo.h, the signed warning messages, and the error in cleaning the gc. Additionally, I have backported the winpthread changes to gc-7.4.2. This enabled the successful building of pthread support on win64-- all the pthread recette tests passed with make test, however, I am not happy with the configuration changes made to do so; I am fairly certain it won't compile correctly on any other platform. I have not committed these changes to the WIN64 branch yet, but if you are interested, an archive including the changes can be found at https://sandjsite.net/jwd/files/bigloo4.2a.tar.gz. If you do get a chance to test the archive, please let me know how it goes.

Best Regards, Joe

GordonBGood commented 9 years ago

Hello Joe,

I am in process of testing your new archive with the winpthread changes (thank you very much for that). I'll try to look at what you did to make this work, as well, and perhaps we can come up with something that won't break builds for other platforms.

One more problem that seems to affect all of the 4.2a versions: "time" seems to be broken and the values for real time, system time, and user time all get display/print'ed as something like #(<???>:00000000) - perhaps something to do with the new stricter typing? This fix would likely also apply to Manuel's main branch.

I was going to do the same as you have done: get a copy of gc-vs7.4.2 and merge in the fairly few changes that apply it to winpthreads, then try my compile, but my method didn't seem to take all that many changes to the configuration as I was experimenting with the (fairly new, I think) method of using the compiler/linker "-Wl"/-wrap options that essentially "hooks" the pthread functions in through the gc equivalent methods, which the new versions of the gc seem to support (method 3b); in that way I hoped to avoid making major changes to the config.

With that method I wouldn't be using the "GC_LINUX_THREADS" macro and thus wouldn't have to change the line in the gcconfig.h file that makes it accept MinGW as being consistent with that macro. Did you look into this or did you modify the succession of macros? Of course I will see what you have done when I examine the autoconf probing files.

When all of this works, I may look into writing better recette's for pthread, as the current ones really pertain too much to the SRFI-18 standard that Manuel doesn't really support any more as previously noted. If I do this, we can contribute these tests back to Manuel if he approves them.

Best Regards, Gordon

GordonBGood commented 9 years ago

Hello Joe,

From your latest archive I configured and used "make", "make test" and "make install" with no problems, and also have written some quick little test programs using it with no problems so far. I need to write a little benchmark to test the benefit versus cost of using threads, which I will get to this week.

In looking at the "thread' autoconfig probe file, I see that you would have had to do a great deal of gc configure modification to make this work as you used the conventional/original "Cee macro shite" method of integrating Windows, which likely won't be compatible with any other Windows platform although perhaps it shouldn't affect Linux/Unix based platforms? If it only affects other Windows platforms, perhaps it doesn't matter, as with MinGW support working I don't think cygwin matters anymore, although the Boehm gc maintainers may not feel the same if you had to make major changes to their code; also they will want to continue to support installing the gc under Microsoft Visual Studio C and any other languages that might use it that don't use a Linux/Unix based toolchain.

Anyway, I am quite pleased to be able to use native threads with Bigloo, so thank you very much.

Best regards, Gordon

GordonBGood commented 9 years ago

Hello Joe,

My testing shows that it takes about 700,000 CPU clock cycles to instantiate a pthread, start it, join it and retrieve a simple result which has been stored in thread-specific by the thread itself (about 0.23 milliseconds on my 3.1 GHz machine), with only about 93,000 of those clock cycles related to actually instantiate the pthread. This is about as usual with the recommendation that native threaded work should not be granualized to finer than about 10 milliseconds, meaning about a 2.5% overhead at that point.

However, the thread pool technique of starting a pool of threads and just handing off work to each helps reduce this overhead. I wrote a test program to change the above conventional pthread use to starting a thread sharing a mutex with the main monitoring thread and using a condition variable to continue (not requiring a new thread or restarting or joining) once the monitor thread has recovered the results from the thread-specific (mutex-specify or condition-variable-specific would ork as well) and changed the arguments/working data set if necessary before signally the continue. This only takes about 22,000 CPU clock cycles per minimum thread start, meaning that granualarity can be finer by a factor of about 30 or about a third of a millisecond with thread overhead of less than 3%.

Generally, this implementation has performance comparable to most native threaded implementations in any language, making it very usable within the above constraints. The API lacks a few basic constructs such as a wait barrier, but they can be built from existing primitives, perhaps not quite so efficiently but they work.

As to performance running conventional non-threaded benchmarks on this threaded runtime, my benchmarks do not detect any difference: a tight loop benchmark that ran at 0.35 seconds still runs at 0.35 seconds. Does this mean any background tick/quantum scheduler takes undetectable amount of time compared to the work done in the benchmark or does it mean that the runtime only runs the scheduler when it is needed as when native threads are being run (the pthread module being imported doesn't seem to make any difference)? Of course my benchmark is based on the time once the main program has already started to run - does the use of native threads delay the start of the main program for the runtime to make and start a native thread to run the main procedure, a delay of about a quarter of a millisecond?

In doing the above testing, I have identified some more bugs in Bigloo, as follows:

I wrote the "thread pool version as follows, which uses the "synchronize" procedure to implement the critical section barrier:

(module test (library pthread) (main start))

(define (gettime thnk)
  (let ((strt (current-microseconds))
        (rslt (thnk))
        (stop (current-microseconds)))
    (begin (print "Elapsed time:  " (quotientllong (-llong stop strt) #l1000) " milliseconds.") rslt)))

(define (test)
  (define loops 1000)
  (define m (make-mutex))
  (define strtcv (make-condition-variable))
  (define rdycv (make-condition-variable))
  (define t (make-thread (lambda ()
      (do () ((=fx 0 1))
        (synchronize m (begin (thread-specific-set! t 42)
                              (condition-variable-signal! rdycv)
                              (condition-variable-wait! strtcv m)))))))
  (define (runthrd)
    (synchronize m (let ((rslt (thread-specific t)))
                     (begin (condition-variable-signal! strtcv) rslt))))
  (begin (synchronize m (begin (thread-start! t) (condition-variable-wait! rdycv m)))
         (do ((i 0 (+fx i 1)) (cnt 0 (+fx cnt (runthrd)))) ((>=fx i loops) (begin (thread-terminate! t) cnt)))))

(define (start argv) (let ((count (gettime test))) (print "Found " count " somethings.")))

The above compiles and runs; however, when the srfi18 library is specified, the above does not link due to dangling references to bgl_ variables and procedures. Substituting the following entirely SRFI-18 compatible code other than having to use the Bigloo "condition-variable-wait! cv m" instead of the SRFI-18 "mutex-unlock! m cv" due to Manuel changing the syntax of "mutex-unlock!" to only take one argument and requiring the use of this new procedure also does not link for the srfi18 library, although it used to do so when compiled it myself using my own modifications adapting to winpthreads:

(define (test)
  (define loops 1000)
  (define m (make-mutex))
  (define cv (make-condition-variable))
  (define rdycv (make-condition-variable))
  (define t (make-thread (lambda ()
      (do () ((=fx 0 1)) (begin (mutex-lock! m)
                                (thread-specific-set! t 42)
                                (condition-variable-signal! rdycv)
                                (condition-variable-wait! cv m)
                                (mutex-unlock! m))))))
  (define (runthrd)
    (begin (mutex-lock! m)
           (let ((rslt (thread-specific t)))
             (begin (condition-variable-signal! cv) (mutex-unlock! m) rslt))))
  (begin (mutex-lock! m) (thread-start! t) (condition-variable-wait! rdycv m) (mutex-unlock! m)
         (do ((i 0 (+fx i 1)) (cnt 0 (+fx cnt (runthrd)))) ((>=fx i loops) (begin (thread-terminate! t) cnt)))))

The above alternate form of the code does work for the pthread library as well as the original, it just doesn't link for the srfi18 library.

It seems that the "condition-variable-wait! cv m [timeout]" procedure is a direct replacement for the second form of the SRFI-18 "mutex-unlock! m cv [timeout]" procedure other than for the change in order of the first two parameters, and seems to include the standard "mutex-unlock! m cv [timeout]" behaviour of leaving the mutex unlocked on exit, thus is more of just a name change than anything. this should really be documented somewhere.

Note that I used the "make-thread' form in order to be compatible with SRFI-18 and the srfi18 library but that creating threads using the instantiate::pthread object form does work as well when using the pthread library.

Best regards, Gordon

GordonBGood commented 9 years ago

Hello Joe,

Bigloo and other Scheme implementatations that support SRFI-18 but use native threads (Chez Scheme being the other major one) don't deliver one piece of important information in a platform independent way: the number of CPU cores available, which is important in setting the number of threads to run for a particular task where there will be few blocks and it is desired to keep all cores busy with minimum context switches. There are ways to deliver this information in a platform independent way stackoverflow question

It is understandable that SRFI-18 does not include this, as Gambit-C's Marc Feeley, who authored it, is more of a proponent of "green" threads as implemented on Gambit; unfortunately, one doesn't then take advantage of multi-core CPU's using those (or the equivalent in "fair" threads).

However, in many ways Marc Feeley's Gambit-C could be better than Bigloo: it is smaller and more concise and more true to its Scheme roots in that it implements Scheme using structured goto's/switch statments (under to covers of a lot of Cee macros) and thus doesn't normally need to deal with the Cee stack (thus has a true fast call/cc), as a result of being small and less cluttered with many API's it is more reliable yet it has the essentials including a FFI for extensions, and it is actually faster than Bigloo for many things not involving tight loops but relying on continuation passing as it is about 20% more efficient not using Cee function calls and automatically has tail call optimizations. Further, Gambit-C's relatively simple "roll-your-own" garbage collector is actually faster than the Boehm gc in many benchmarks, perhaps because it isn't a "conservative" one and perhaps because it is adapted for Scheme's needs rather that being general purpose.

OTOH, Marc Feeley doesn't see a pressing need to adopt a native thread interface at all (even to the extent of supporting his "green" light weight threads as Haskell does)) and thus has a polling call per function call to check for thread preemption events (currently including Control-C and stack/heap overflow/maintenance) although such polling can be disabled by compiler options; further his compiler optimizations are actually hindered by his intermediate Gambit Virtual Machine (GVM) compilation process which doesn't have the compilation sophistication as to type optimizations as Bigloo does to produce the ultimate in straight code optimizations as used in tight loops (although at a cost in compilation time for Bigloo).

It's too bad we couldn't marry the best features of each, but meanwhile I keep toggling back and forth between them.

Regards, Gordon

donaldsonjw commented 9 years ago

Hello, Gordon,

I am glad you have been able to run (for the most part) your tests. From a performance perspective, I would not expect to see the impact of threading be in tight loops consisting of only arithmetic operations, but instead in calls to library functions like i/o and networking where previously no locks were required but are in a multi-threaded scenario. Now, for the bigloo libraries, some relatively recent lock optimizations added by Manuel may have alleviated the problem to some extent, but I would still expect to see some performance difference in calling the mult-threaded versions of the c and windows runtime functions.

I have not spent much time looking at srfi18 library, but I don't believe it should be too difficult to fix the errors you cited and correct the intended semantics. I may look at that this weekend.

I agree that the supplied threading api is just the basic building blocks required, and I even attempted to augment this awhile back with a few higher-level threading abstractions. The library is called pthread-extra and is found at https://github.com/donaldsonjw/pthread-extra. It would require a substantial amount of work to get to production level quality but could possibly be a starting point.

Gambit is a high quality implementation of scheme, and as you mentioned, much more faithful in its implementation of the scheme standards. Although I have not used them, I have also heard good things about Chicken and Larceny, found at http://www.call-cc.org/ and http://www.larcenists.org/ respectively. They all have their strengths and weaknesses. The correct choice is dependent on the nature of your application. I will agree, however, that a green thread implementation for bigloo would be very useful, more so, if was synergistically combined with the existing native thread support.

By the way, what specific application domain are you looking to apply scheme?

Best Regards, Joe

GordonBGood commented 9 years ago

Hello Joe,

Ah, I understand about where to look for changes in speed when built on a threaded runtime, it makes sense.... So a quick little benchmark sending output to a string should be affected then, as there is a possibility that other threads could be sharing output to that same string and therefore there will be locks applied behind the scenes?

It would be good if you have time to look at fixing the problems with the srfi18 library (inconsistent spelling the the documentation, but the actual library does not have the '-'), as that is the library that is more to some sort of standard even if the use of the "mutex-state" procedure with all of its state possibilities including being owned by threads makes it slower.

I had a quick look at your pthread-extra library and see where you were going with that. It is interesting that it is partly modelled after the Haskell basic concurrency model with MVar's, as that is a "light" threaded model. I have also been thinking about a task parallel based library such as used by DotNet and Scala as that in turn is based on the use of thread pools, which I have shown to be so much more efficient; however that could be built on any basic concurrency model. I believe the actor model has been downgraded as a concurrency model and scala (which used to use it as its main model) now favours more modern models.

I have also looking into Larceny and Chicken Scheme dialects; Chicken is good in a way in that it has a module system, but its performance for any kind of continuations is atrociously slow since it re-writes calls into continuation passing style, which means double continuations when one is already using that style. Further, using Cheney on the MTA for stack management means extremely poor CPU cache associativity as it continuously needs to transfer Cee stack to the heap and back. In combination, for any kind of an algorithm that doesn't use tight loops but rather a succession of procedure calls (other than call/cc, at which it is very fast as it is designed for that), it can be up to over 20 times slower than Gambit or Bigloo. Larceny is interesting in that it was the first Scheme dialect to adopt the R6RS standard, but it is essentially dead in that it hasn't been developed since 2009, almost six years! IIRC its output is also based on the old 'fasl' file format and it by default compiles source code as required although one can call the compiler directly to precompile procedures/libraries so it doesn't really generate '.exe' files directly without going through some conversion hoops.

I am semi-retired and learning computer languages / adding them to my arsenal just to keep my hand in and my mind active. However, in my career I used to be involved with real-time systems; thus my interest in threading applications. I only in the last few years started to use functional programming techniques but am now completely sold on the concept, and have spent quite a bit of time learning Haskell, F#, Scala as a more functional language on the JVM platform (looked at Clojure but have found it to be extremely slow), and now have revisited Lisp which lead me to Scheme after many years away from Lisp based languages (limited to using Lisp as the macro language of Autocad in previous releases). So I don't have a specific application domain for Scheme, but rather am looking for a functional general-purpose language with good performance. Syntactically, I prefer "whitespace formatted" languages such as Haskell or F# (or Python) once one gets used to them and like the ease of returning multiple values in those languages, but I like Scheme's simplicity to learn and use as compared to Haskell (F# is also quite simple to learn as functional languages go).

I actually enjoy making the most of a given language for a given application, which is something that was absolutely necessary 40+ years ago when I first started using computers/microprocessors (I'm a graduate engineer). I suppose my interest in Scheme is increased in that as a simple language one can understand and build on all the basic concepts upon which it is built.

Best regards, Gordon

On Sun, Mar 15, 2015 at 12:39 AM, Joseph Donaldson <notifications@github.com

wrote:

Hello, Gordon,

I am glad you have been able to run (for the most part) your tests. From a performance perspective, I would not expect to see the impact of threading be in tight loops consisting of only arithmetic operations, but instead in calls to library functions like i/o and networking where previously no locks were required but are in a multi-threaded scenario. Now, for the bigloo libraries, some relatively recent lock optimizations added by Manuel may have alleviated the problem to some extent, but I would still expect to see some performance difference in calling the mult-threaded versions of the c and windows runtime functions.

I have not spent much time looking at srfi18 library, but I don't believe it should be too difficult to fix the errors you cited and correct the intended semantics. I may look at that this weekend.

I agree that the supplied threading api is just the basic building blocks required, and I even attempted to augment this awhile back with a few higher-level threading abstractions. The library is called pthread-extra and is found at https://github.com/donaldsonjw/pthread-extra. It would require a substantial amount of work to get to production level quality but could possibly be a starting point.

Gambit is a high quality implementation of scheme, and as you mentioned, much more faithful in its implementation of the scheme standards. Although I have not used them, I have also heard good things about Chicken and Larceny, found at http://www.call-cc.org/ and http://www.larcenists.org/ respectively. They all have their strengths and weaknesses. The correct choice is dependent on the nature of your application. I will agree, however, that a green thread implementation for bigloo would be very useful, more so, if was synergistically combined with the existing native thread support.

By the way, what specific application domain are you looking to apply scheme?

Best Regards, Joe

— Reply to this email directly or view it on GitHub https://github.com/donaldsonjw/bigloo/issues/1#issuecomment-80627868.

donaldsonjw commented 9 years ago

Hello, Gordon,

I did not get to look at srfi18 last weekend but am doing so today, and I am not seeing the linking problems you reported. The sample code you provide compiled and ran with both the pthread and srfi18 libraries. Do you recall what symbols were giving you problems? I took a jab at adding a srfi18 compliant version of mutex-unlock!. I used the following code to test:

(module test (library srfi18) (main start))

(define (test)
  (define loops 1000)
  (define m (make-mutex))
  (define cv (make-condition-variable))
  (define rdycv (make-condition-variable))
  (define t (make-thread (lambda ()
      (do () ((=fx 0 1)) (begin (mutex-lock! m)
                                (thread-specific-set! t 42)
                                (condition-variable-signal! rdycv)
                                (mutex-unlock! m cv))))))
  (define (runthrd)
    (begin (mutex-lock! m)
           (let ((rslt (thread-specific t)))
             (begin (condition-variable-signal! cv) (mutex-unlock! m) rslt))))
  (begin (mutex-lock! m) (thread-start! t)  (mutex-unlock! m rdycv)
         (do ((i 0 (+fx i 1)) (cnt 0 (+fx cnt (runthrd)))) ((>=fx i loops) (begin (thread-terminate! t) cnt)))))

(define (gettime thnk)
  (let ((strt (current-microseconds))
        (rslt (thnk))
        (stop (current-microseconds)))
    (begin (print "Elapsed time:  " (quotientllong (-llong stop strt) #l1000) " milliseconds.") rslt)))

(define (start argv) (let ((count (gettime test))) (print "Found " count " somethings.")))

I believe I captured the semantics correctly but could be mistaken. My definition is


(define (mutex-unlock! mutex #!optional condvar timeout)
  (when condvar
    (if timeout
        (condition-variable-wait! condvar mutex timeout)
        (condition-variable-wait! condvar mutex)))
  ((@ mutex-unlock!  __thread) mutex))

A Bigloo archive containing the changes can be found at https://sandjsite.net/jwd/files/bigloo4.2a_srfi18mod.tar.gz.

Additionaly, I looked into obtaining the number of processors, and for bigloo's native backend, the following code snippet should do so in a portable fashion, since bigloo's native backend depends on pthreads for threads on all supported platforms.

(define (number-of-processors)
  (pragma::long "pthread_num_processors_np();"))

We could do something very similar for the jvm backend by calling

Runtime.getRuntime().availableProcessors()

By the way, I noticed that a new version of Larceny was released; it may be worth looking at.

Best Regards, Joe

GordonBGood commented 9 years ago

Hello Joe,

Thanks for the update on the new version of Larceny; I was wrong that development has stopped. I have downloaded the (32-bit) Windows version and will run some benchmarks.

I'm afraid I must admit that I have not done much on Scheme this past week as there is a new version of Haskell in Release Candidate 3 that I wanted to test. I think that perhaps Haskell would be my preferred functional language as to performance and concise source code but lacks a good IDE (although the Haskell Platform is very usable for debugging and there is Leksah - EclipseFP isn't current to latest versions) and the LLVM backend still doesn't work under Windows 64 (not Haskell's problem but a bug in binutils under Windows, which it uses); now that the problem is known, it will probably get fixed by the next release in about a year. A lot of (Microsoft Research funded) research has been done on the Haskell custom garbage collector and compiler and it generates the fastest code for the least source code I have ever seen (although the current native code generator stinks and one must use the LLVM back end for the fastest code). It usually runs faster than Cee as long as one pays attention to strictness (Haskell is lazy by default). It has very stable native thread support although implemented as light threads with native threading scheduling.

I got thinking this week that perhaps srfi18 doesn't run under my system due to something about the build process I used, and should really rebuilt my release. I'll build your latest archive and if there are further problems I report them to you. Last time I did a standard build just as I had done before, but did run "make test" before running "make install" (having already set --prefix= to my install directory in ./configure); I'm wondering if "make test" somehow corrupted or changed some files before the installl. As you have already run "make test", this time I will go directly to "make install" after a successful "make" and then perhaps run "make test" after the files are already safely installed.

Thanks for the work on a SRFI-18 compliant "mutex-unlock!" and the number of processors. Not being able to test currently, IIRC the "mutex-lock!" is also not quite SRFI compliant in that it doesn't accept the third "thread" argument to assign a specific thread as owner, not that most programs would use that and would accept the default of assigning "current-thread" as owner when given this form. I'll check this again once srfi18 works again.

For an example of what Haskell can do that seemingly no other programming language can is the Hamming problem, as I encountered on ProgrammingPraxis at http://programmingpraxis.com/2015/03/06/357-numbers/.

My best Scheme implementation for Bigloo is as follows (using Co Inductive Streams as Scheme has no lazy list):

(module test (main start))

(define (gettime thnk)
  (let ((strt (current-microseconds))
        (rslt (thnk))
        (stop (current-microseconds)))
    (begin (print "Elapsed time:  " (quotientllong (-llong stop strt)
#l1000) " milliseconds.") rslt)))

(define (s357)
  (define (merge x y)
    (let ((xv (car x)) (yv (car y)))
      (cond ((< xv yv) (cons xv (lambda () (merge ((cdr x)) y))))
                ((> xv yv) (cons yv (lambda () (merge x ((cdr y))))))
                (else (cons xv (lambda () (merge ((cdr x)) y)))))))
  (define (smult s m) (cons (* (car s) m) (lambda () (smult ((cdr s))
m)))) ;; equiv to map (* m)
  (define (s3 n) (cons n (lambda () (s3 (* n 3)))))
  (define (s35) (cons 1 (lambda () (merge (s3 3) (smult (s35) 5)))))
  (cons 1 (lambda () (merge ((cdr s35r)) (smult (s357) 7)))))

;;; test...
(define limit 100000)
(define (countemto lmt)(do ((nxt (s357) ((cdr nxt))) (cnt 0 (+ cnt 1))) ((>= cnt lmt) (car nxt))))
(define (start argvs) (display (gettime (lambda () (countemto limit)))
(newline)))

Where Haskell can be written as (this order produces slightly faster code on my machine):

{-# OPTIONS -O2 -optc-O3 -rtsopts -fllvm #-}

import Data.Function (fix)

hamm :: () -> [Integer]
hamm() = 1 : foldl (\s n -> fix (merge s . (n:) . map (n*))) [] [7,5,3]
  where merge [] b = b
        merge a [] = a
        merge a@(x:xs) b@(y:ys) = if x < y then x : merge xs b
                                  else y : merge a ys

main = do
  print $ (hamm()) !! (10000000 - 1)

When run as "seq357 +RTS -s" the Haskell code runs at about 3.4 seconds total time on my machine with a little over half spent in garbage collection whereas compiled Bigloo (-Obench -copts "-O3") runs the previous code at about 4.4 seconds total time for about a hundredth of the work (the sequence is approximately linear computational complexity with number of elements in the sequence). Gambit-C is a little faster at this as it doesn't doe Cee function calls but rather sturctured goto's and thus avoids some of the call overhead, but still is effectively making procedure calls, just slightly short circuited.

I chalk Haskell's (32-bit compiler version 7.8.3 under latest Haskell Platform on Windows) performance up to optimizing away procedure calls and more efficient garbage collection. What do you think?

I think that this is a good example of the application of pure functional programming principles actually being able to outperform more imperative code due to the optimizations that can be made and shows even Schemes limitations (due to its age and not really being completely "pure", with "set!" procedures part of the language).

Best regards, Gordon

GordonBGood commented 9 years ago

Hello Joe,

I've checked out the new version of Larceny, and while it's great that it is still being actively developed and impressive that this was the first version of Scheme that support the R6RS and R7RS standards (with only Racket and the commercial Chez Scheme even supporting the R6RS standard that are still reasonably active), it appears to be over twice as slow as Bigloo or Gambit-C doing anything garbage collection related although "time" says that the garbage collector is fairly efficient.

I wonder if this is something the same reason as why Chicken Scheme is slow doing continuation passing style types of things as I see "cheney" mentioned in the Larceny "time" output, which may mean "Cheney on the MTA" heap handling, which actual handling is fast but it means poor cache associativity and slow memory access operations.

As far as functional programming paradigms go, Bigloo and Gambit-C are comparable in speed to Microsoft's F# for doing the above Hamming benchmark, but I'm still trying to figure out exactly why Haskell is so blazingly much faster than anything else other than as I have mentioned in the post where I listed the benchmarks.

I am now building your new archive and will report on whether srfi18 works for me or not.

Best regards, Gordon

GordonBGood commented 9 years ago

Hello Joe,

The tatest archive is still failing for the srfi18 library in the same way, with the output when linking the test program I gave you as follows:

C:\Users\Super\Desktop\Haskell Stuff>bigloo -Obench -copt "-O3" testBigloox.scm -o testBigloox gcc -O3 -mthreads -D_MINGW_VER -c -I. -I. -If:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a testBigloox.c f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(cmutex.o):cmutex.c:(.text+0x175): undefined reference to bglpth_current_pthread' f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(cmutex.o):cmutex.c:(.text+0x381): undefined reference tobglpth_current_pthread' f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(ccondvar.o):ccondvar.c:(.text+0x119): undefined reference to bglpth_condvar_init' f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(ccondvar.o):ccondvar.c:(.text+0x151): undefined reference tobglpth_condvar_init' f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(thread.o):thread.c:(.text+0x7cb): undefined reference to BGl_modulezd2initializa7ationz75zz__pth_threadz00' f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(thread.o):thread.c:(.text+0x9b1): undefined reference toBGl_modulezd2initializa7ationz75zzpth_threadz00' f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(thread.o):thread.c:(.rdata$.refptr.BGl_pthreadz00zzpth_threadz00[.refptr.BGl_pthreadz00zzpth_threadz00]+0x0): undefined re ference to `BGl_pthreadz00zzpth_threadz00' f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(thread.o):thread.c:(.rdata$.refptr.BGl_uncaughtzd2exceptionzd2zzpth_threadz00[.refptr.BGl_uncaughtzd2exceptionzd2zzpth_thr eadz00]+0x0): undefined reference toBGl_uncaughtzd2exceptionzd2zz__pth_threadz00' f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(backend.o):backend.c:(.text+0x1f0): undefined reference toBGl_modulezd2initializa7ationz75zzpth_threadz00' f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(backend.o):backend.c:(.text+0x335): undefined reference to`BGl_modulezd2initializa7ationz75zzpth_threadz00' f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(backend.o):backend.c:(.text+0x91): undefined reference to bglpth_current_thread' f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(cthread.o):cthread.c:(.text+0x4d): undefined reference tobglpth_thread_run' f:/bigloo4.2a-winpthread_MinGW64/lib/bigloo/4.2a/libbigloosrfi18_u-4.2a.a(cthread.o):cthread.c:(.text+0xf9): undefined reference to `bglpth_thread_env_create' collect2.exe: error: ld returned 1 exit status 1

This is an issue with the unsafe version of the library as if I set an optimization level that doesn't require unsafe it compiles and runs fine; this probably explains why the recette tests pass while my -Obench tests don't.

Best regards, Gordon

GordonBGood commented 9 years ago

Hello Joe,

I've worked out why Haskell is so much faster for the comparative program I posted earlier: Haskell's lazy list memorization is necessary to avoiding repeating expensive calculations for this algorithm. We can emulate laziness using the R5RS delay/force promise procedures plus make the internal former procedures just linked pair variables equivalent to lazy lists with the code as follows (using the common implementation of "time" such as Chez Scheme or Gambit-C; not Bigloo):

(define (seq357)
  (define (merge x y)
    (let ((xv (car x)) (yv (car y)))
      (if (<= xv yv) (cons xv (delay (merge (force (cdr x)) y)))
                     (cons yv (delay (merge x (force (cdr y))))))))
  (define (smult m s) (cons (* m (car s)) (delay (smult m (force (cdr s)))))) ;; equiv to map (* m)
  (define (s7 n) (cons n (delay (s7 (* n 7)))))
  (define s57 (cons 1 (delay (merge (s7 7) (smult 5 s57)))))
  (define s357 (cons 1 (delay (merge (force (cdr s57)) (smult 3 s357)))))
  s357)

;;; test...
(define limit 1000000)
(define (countemto lmt)(do ((nxt (seq357) (force (cdr nxt))) (cnt 1 (+ cnt 1))) ((>= cnt lmt) (car nxt))))
(display (time (countemto limit))) (newline)

The above code is only a little over twice as slow as the Haskell code with its specialized rule based optimizations for lazy lists, and about the same speed if the Haskell code is written to emulate the above using the same form of stream implementation using thunks as delayed calculations when compiled with maximum optimizations under Gambit-C; however I'm afraid that this is a situation where Bigloo is about ten times slower than Gambit-C.

Best regards, Gordon.

donaldsonjw commented 9 years ago

Hello, Gordon,

I can see how the memoization would speed things up. Below is my best version (it is in the same ballpark speedwise as your implementation).

(module hamm
   (main main))

(define-macro (stream-cons a . exp)
   `(cons ,a (delay ,@exp)))

(define-inline (stream-car s)
   (car s))

(define-inline (stream-cdr s)
       (force (cdr s)))

(define (stream-pair? p)
   (pair? p))

(define-inline (stream-merge a b)
  ; (printf "merging a: ~a and b: ~a ~%"  a b)
   (cond ((null? a)
      b)
     ((null? b)
      a)
     (else (if (< (stream-car a) (stream-car b))
           (stream-cons (stream-car a)
              (stream-merge (stream-cdr a) b))
           (stream-cons (stream-car b)
              (stream-merge a (stream-cdr b)))))))

(define-inline (stream-map s p)
       (stream-cons (p (stream-car s))
      (stream-map  (stream-cdr s) p)))

(define (stream-ref s i)
   (let loop ((c 0)
          (strm s))
      (if (= c i)
      (stream-car strm)
         (loop (+ c 1)
        (stream-cdr strm)))))

(define (multiples-of-7&5&3)
   (letrec* ((*7s (stream-cons 7
             (stream-map *7s (lambda (x) (* x 7)))))
         (*75s (stream-merge *7s
              (stream-cons 5
             (stream-map *75s
                (lambda (x) (* x 5))))))
         (*753s (stream-merge *75s
               (stream-cons 3
              (stream-map *753s (lambda (x) (* x 3)))))))
      (stream-cons 1 *753s)))

(define (main args)
  (print (stream-ref (multiples-of-7&5&3)  1000000)))

I profiled the application and saw that it is spending a large amount of time in garbage collection. I believe this is due to both of our solutions inadvertently holding onto the root(s) of the lazy list(s). We really should not see the amount of memory allocation that we are seeing. I have tried a few things to remedy this but have not had any luck. I did read in the write up for srfi-41 that such issues are not uncommon in scheme (lazy list) stream implementations.

And I apologize for the tardiness in replying. The last week or so has been very hectic.

Best Regards, Joe

GordonBGood commented 9 years ago

Hello Joe,

Although it basically does the same thing, I like your formatting and have changed mine to the following, which is just very slightly slower than as I had it before due to the use of the map lambda function rather than just calling multiply directly; note that I have used "letrec" so it runs under more R5RS conformant Schemes which do not have "letrec*", but multiple "define"'s generate the same code; I have also used only those parts of generating a lazy stream as needed by the algorithm:

(define (multiples-of-7&5&3)
  (define (map f s) (cons (f (car s)) (delay (map f (force (cdr s))))))
  (define (merge xs ys)
    (let ((x (car xs)) (y (car ys)))
      (if (<= x y) (cons x (delay (merge (force (cdr xs)) ys)))
                   (cons y (delay (merge xs (force (cdr ys))))))))
  (letrec ((*7s (cons 7 (delay (map (lambda (x) (* 7 x)) *7s))))
           (*75s (cons 5 (delay (merge *7s (map (lambda (x) (* 5 x)) *75s)))))
           (*753s (cons 3 (delay (merge *75s (map (lambda (x) (* 3 x)) *753s))))))
    (cons 1 (delay *753s))))

Yes even though the memoization/lazy list/stream implementations reduce computations by a factor of almost 40, they do increase the use of heap memory which puts a heavier burden on efficient garbage collection; I did detect the fairly large amount of memory use for the Hamming number solution with some Scheme implementations, although with others it seems to be minimal - Gambit-C seems to only have a maximum memory residency for computing the sequence up to the first 10 million numbers of about 40 Megabytes, which is about four times what Haskell uses but this is 64-bit Gambit (meaning pointers twice as big as my 32-bit Haskell) and using Gambit's copying garbage collector (which would at least temporarily double heap memory requirements). I don't know that one can completely eliminate fairly high memory use in the lazy lists/streams as the intermediate streams are necessary to compute future values of the sequence: past values of the s57 sequence are needed for successive computations and the final 357 sequence, and past values of the 357 sequence are needed for the final sequence generator (recursively). Every number generated will generate a deferred lazy thunk in the heap but only the ones from one third the maximum number, and the *75s ones between one fifth of the maximum number and one third of the maximum number need to be kept by the garbage collector, which aren't so many as the density of the sequence numbers get more and more sparse for higher ranges. Although I haven't done an exact calculation, a few Megabytes storage for these deferred calculations seems to be ballpark of what should be requried considering that each consists of a pair of a big integer and a thunk pointer at many 10's of bytes per pair.

You say Bigloo uses a lot more heap for the same problem? As the bindings to the streams are internal to the overall function and recursive, they should be consumed as the sequence progresses, which should release the beginning of each of the sequences for garbage collection as it seems that Gambit does properly.

Do you have any ideas on the Bigloo unsafe srfi18 library (and others?) having dangling link references?

Best regards, Gordon

donaldsonjw commented 9 years ago

Hello, George,

The linking issue you are experiencing when compiling srfi18 apps with -Obench is due to static linking. The srfi18.init file does not include the srfi18's dependency on the the bigloo pthread library. The updated version of srfi18.init below should resolve your problem. If other libraries are demonstrating similar problems, we will need to update there init files as well.

;*=====================================================================*/
;*    .../prgm/project/bigloo/api/srfi18/src/Misc/srfi18.init.in       */
;*    -------------------------------------------------------------    */
;*    Author      :  Manuel Serrano                                    */
;*    Creation    :  Wed Nov  7 05:40:36 2001                          */
;*    Last change :  Sun Apr 20 18:23:31 2008 (serrano)                */
;*    Copyright   :  2001-08 Manuel Serrano                            */
;*    -------------------------------------------------------------    */
;*    The Srfi18 init file.                                            */
;*=====================================================================*/

;*---------------------------------------------------------------------*/
;*    The library                                                      */
;*---------------------------------------------------------------------*/
(declare-library! 'srfi18
          :basename "bigloosrfi18"
          :srfi '(srfi-18 srfi18)
          :module-init '__srfi18_thread
          :module-eval '__srfi18_makelib
          :class-init "bigloo.srfi18.jthread"
          :class-eval "bigloo.jthread.make_lib")

;*---------------------------------------------------------------------*/
;*    Register the srfi                                                */
;*---------------------------------------------------------------------*/
(cond-expand
   (bigloo-compile
    (let* ((fname (find-file/path "pthread.init" *lib-dir*)))
                (loadq fname))                      
              (set! *additional-bigloo-libraries*           
      (cons 'bigloopthread *additional-bigloo-libraries*))

    ;; setup
    (set! *bigloo-libraries-c-setup*
      (cons "srfi18_setup" *bigloo-libraries-c-setup*))))

On the lazy list front, I am now looking at srfi41 and seeing how one would implement a solution to the hamming problem in it. I will let you know how it goes.

Best Regards, Joe

GordonBGood commented 9 years ago

Joe, Thanks for the fix for srfi18; I'll build it today and try it out.

The main difference between SRFI-41 and what you have already tried is that SRFI-41 implements an extra level of promise at the outer level of each pair, which would if anything make it slightly slower for the extra force call required although it might make the nested closures more visible to the garbage collector and thus get the start of the streams collected. If Bigloo is significantly slower due to retaining pointers to the internal sub-streams then I would think something would need to be done about those internals.

In fact, Bigloo should really run something at the same speed as Gambit-C on the same code such as I provided last time, other than for any inefficiencies of the Boehm garbage collector (Gambit's garbage collection takes about two seconds for the sequence to 10 million numbers, which is about the same time as the Haskell garbage collector takes).

My point is that if Gambit-C can garbage collect recognizing that the beginning of the intermediate streams are no longer referenced, why not Bigloo?

For the slight cost in speed of using "map" rather than a "smult" function that calls multiply directly rather than through a lambda, I also tried the "outer/worker" pattern that help in such things in Haskell by eliminating an extra parameter in the inner loop (the function in this case) but it didn't make any difference to Gambit; it may be that it could help Bigloo?

Best regards, Gordon

donaldsonjw commented 9 years ago

Hello, Gordon,

I have continued to fiddle around with the hamming problem. Below is my fastest version so far. Recognizing that in many cases it does not make sense to suspend mapping and merging when the values being processed have already been forced, I have introduced stream-merge* and stream-map* which will continue to merge or map, respectively, while values are available. Their use has provide a sizable boost in performance.

(module hamm
  (main main))

(define-macro (stream-car s)
   `(car ,s))

(define-macro (stream-cdr s)
   `(begin
       (when (procedure? (cdr ,s))
      (let ((rp (force (cdr ,s))))
         (set-cdr! ,s rp)))
       (cdr ,s)))

(define-macro (stream-cons a b)
   `(cons ,a (delay ,b)))

(define (forced? s)
   (not (procedure? (cdr s))))

(define (stream-map s proc)
   (define (smap s)
      (stream-cons (proc (stream-car s))
     (smap (stream-cdr s))))
   (smap s))

;;; stream-map* will eagerly map over the stream when previously forced values
;;; are available
(define (stream-map* s proc)
   (define (smap* s)
      (if (forced? s)
      (cons (proc (stream-car s))
         (smap* (stream-cdr s)))
      (stream-cons (proc (stream-car s))
         (smap* (stream-cdr s)))))
   (smap* s))

(define (stream-merge a b)
   (define (smerge a b) 
      (cond ((null? a)
         b)
        ((null? b)
         a)
        (else (let ((a-val (stream-car a))
            (b-val (stream-car b)))
             (if (< a-val b-val)
             (stream-cons a-val (smerge (stream-cdr a) b))
             (stream-cons b-val (smerge a (stream-cdr b))))))))
   (smerge a b))

;;; stream-merge* will eagerly merge streams when previously forced values
;;; are available
(define (stream-merge* a b)
   (define (smerge* a b) 
      (cond ((null? a)
         b)
        ((null? b)
         a)
        (else (let ((a-val (stream-car a))
            (b-val (stream-car b)))
             (if (< a-val b-val)
             (if (forced? a)
                 (cons a-val (smerge* (stream-cdr a) b))
                 (stream-cons a-val (smerge* (stream-cdr a) b)))
             (if (forced? b)
                 (cons b-val
                (smerge* a (stream-cdr b)))
                 (stream-cons b-val
                (smerge* a (stream-cdr b)))))))))
   (smerge* a b))

(define (stream-ref s i)
   (let loop ((c 0)
          (strm s))
      (if (= c i)
      (stream-car strm)
         (loop (+ c 1)
        (stream-cdr strm)))))

(define (multiples-of-7&5&3)
   (letrec* ((*7s (stream-cons 7
             (stream-map *7s (lambda (x) (* x 7)))))
         (*75s (stream-merge* *7s
              (stream-cons 5
             (stream-map* *75s
                (lambda (x) (* x 5))))))
         (*753s (stream-merge* *75s
               (stream-cons 3
              (stream-map* *753s (lambda (x) (* x 3)))))))
      (stream-cons 1 *753s)))

(define (main args)
   (let ((s (multiples-of-7&5&3)))
      (print (stream-ref s 10000000))))

As you correctly predicted, I did see slightly larger memory use when using srfi41. From what I can tell, the memory issues are due to the large number of promises created and not the stream values themselves. I have not investigated either haskell's or gambit's closure implementation, but I suspect it is substantially more light weight than Bigloo's implementation.

Best Regards, Joe