PolyChord / PolyChordLite

Public version of PolyChord: See polychord.co.uk for PolyChordPro
https://polychord.io/
Other
83 stars 26 forks source link

Support for pip (replaces PR#19) #24

Closed tilmantroester closed 4 years ago

tilmantroester commented 5 years ago

This is essentially PR #19 but branched off master, so that it's easier to stay up-to-date. It also restricts the changes to setup.py and leaves the existing makefiles unchanged (up to some QOL improvements).

Other improvements: MPI and DEBUG can be set to 0 and 1 to disable/enable MPI support and debug flags. For setup.py these are the --no-mpi (MPI=0) and --debug-flags (DEBUG=1).

I also fixed the Fortran examples by putting a logzero constant in utils_module.

williamjameshandley commented 5 years ago

The Travis build is failing with

$MACOSX_DEPLOYMENT_TARGET mismatch: now "10.9" but "10.13" during configure

on the osx builds. Does changing the relevant environment variable in the .travis.yml fix this issue?

tilmantroester commented 5 years ago

The Travis build is failing with

$MACOSX_DEPLOYMENT_TARGET mismatch: now "10.9" but "10.13" during configure

on the osx builds. Does changing the relevant environment variable in the .travis.yml fix this issue?

Unfortunately not. I tried that in commit b822c82 and the travis log shows $ export MACOSX_DEPLOYMENT_TARGET=10.9 but the error message at the end still complains about MACOSX_DEPLOYMENT_TARGET=10.13.

tilmantroester commented 4 years ago

I made an attempt to deal with clang on macOS. As far as I can tell it's working on macOS with clang and gcc, and on linux with gcc and intel compilers. The travis tests aren't running though, for some reason.

williamjameshandley commented 4 years ago

This seems to be working on my system as well. Is there any way we can get it to work with python 2? I'm really reluctant to release support. Even though python2 is deprecating at the end of this year there will still be a lot of scientific code that relies on it for some time after that, and this would be a breaking change for all of those.

tilmantroester commented 4 years ago

I tried changing setup.py to support Python 2 but it got very hacky. For example, the distutils classes don't seem to derive from object, which for Python 2 means that super doesn't work. Instead of mucking up setup.py with workarounds, I added a setup2.7.py, which works with 2.7, at least on my machine.

williamjameshandley commented 4 years ago

I've confirmed that this works on my machine, and my merging the two files appropriately I've got a version of setup.py that works on python2 and python3.

Can you relax the branch protection rules so that I can push my changes to this PR?

williamjameshandley commented 4 years ago

@tilmantroester are you able to relax the branch protection rules?

tilmantroester commented 4 years ago

Done. Sorry for the delay. Could you move the existing setup.py to something like setup-py3.py to keep the cleaner python 3 version around?

williamjameshandley commented 4 years ago

That should be c++ since make on macOS puts c++ in CXX unless set explicitly.

Ah, I had changed it to g++, since c++ causes it to fail on linux. I will comment further via review.

williamjameshandley commented 4 years ago

I've pushed an alternative makefile setup that minimises the changes since master. Does that work for you on your osx setup, or did I break something in that change? (We can revert if it did)

tilmantroester commented 4 years ago

I think I got a fix that preserves the ability to specify the compiler. Could you revert the last commit so that I can apply the fix?

williamjameshandley commented 4 years ago

I think I got a fix that preserves the ability to specify the compiler. Could you revert the last commit so that I can apply the fix?

I've just done this, but in future I think you can do this locally as necessary by pulling down the latest changes.

williamjameshandley commented 4 years ago

OK, all the tests are now passing, apart from OSX + MPI, which fails with the statement:

WARNING: Could not preload libmpi.so.If you are running with MPI, this may cause segfaults

followed by a segfault. Have you been able to get polychord with MPI=1 running on OSX?

tilmantroester commented 4 years ago

It works fine on my macOS machine. I also had a quick look at the preloading MPI. It seems that mpi4py does take care of this, so the call CDLL("libmpi.so", mode=RTLD_GLOBAL) shouldn't be necessary. I don't know why it crashes for travis though. In case you want to load MPI with ctypes, you'd have to adjust the name of the MPI library to the MPI version and platform (e.g., on macOS the library ends in .dylib). mpi4py already did all that work (see here)

williamjameshandley commented 4 years ago

It works fine on my macOS machine. I also had a quick look at the preloading MPI. It seems that mpi4py does take care of this, so the call CDLL("libmpi.so", mode=RTLD_GLOBAL) shouldn't be necessary. I don't know why it crashes for travis though.

Does the latest push work for you? I've tried removing CDLL since it does indeed seem that it's no longer necessary on my machine. If so, then we should also probably change the default to be MPI for OSX as well.

williamjameshandley commented 4 years ago

Unfortunately mpi4py doesn't seem to be fixing the issue on the travis OSX. Do you have any more ideas @tilmantroester ?

tilmantroester commented 4 years ago

It's still working on my machine. Maybe try another MPI distribution? E.g., mpich? Or have travis print the build output (pip install . -v) to see if the compilers are the MPI ones.

williamjameshandley commented 4 years ago

pip install . -v to see if the compilers are the MPI ones

They do seem to be the mpi ones

It's still working on my machine. Maybe try another MPI distribution? E.g., mpich?

Latest push tries this. Which ones are on your machine?

williamjameshandley commented 4 years ago

Alas, an even less informative segmentation fault if mpich is used. Any ideas?

tilmantroester commented 4 years ago

No, I'm at my wits end. Can you clear the travis cache? That's suggested here: https://docs.travis-ci.com/user/common-build-problems/#segmentation-faults-from-the-language-interpreter-ruby-python-php-nodejs-etc. The only thing on macOS and MPI on travis I've found is https://github.com/JuliaParallel/MPI.jl/issues/262, where they just allow travis to fail.

Is master still working on travis?

williamjameshandley commented 4 years ago

Can you clear the travis cache?

There's no caching set up for this repo

Is master still working on travis?

Yes

The only thing on macOS and MPI on travis I've found is JuliaParallel/MPI.jl#262, where they just allow travis to fail.

We could go with that option, although its far from ideal, as I don't want OSX users to run into this issue on their own if it's not something entirely travis-specific

tilmantroester commented 4 years ago

Maybe it's due to using the mpi compilers to compile the python extension. That is a bit hacky but was needed to resolve linking errors if I remember correctly. I can try to get this working with the standard compilers but probably won't get to this until next week.

tilmantroester commented 4 years ago

After messing with this for a while I at least got it to crash with some errors. The first was an error

At line 167 of file mpi_utils.F90 (unit = 6, file = 'stdout')
Fortran runtime error: Missing initial left parenthesis in format
���3

Rewriting this to a write(*,*) "..." fixes this but now crashes at call random_seed(put=seed) in random_utils.F90. If added a bunch of debug print statements around there. Things are weird. The seed array appears empty.

I remember this portion giving me pain me before, with debugging efforts not showing a clear reason for the issue. I think that's why I started enforcing the use of the MPI compilers for the python extension, which back then appeared to fix things. From that I concluded that the call random_seed was a red herring and things just happened to crash there because some memory corruption caused by mismatched run-time libraries. But now travis crashes with either the MPI or standard compilers, while on my machine the standard compilers work.

Travis seems to install a whole lot via homebrew (e.g., numpy) that shouldn't be needed. Is there some caching going on?

tilmantroester commented 4 years ago

If you look at the output from random_utils.F90, something is going very wrong:

 seed allocation status:           0
 seed allocated successfully. shape(seed):           0
 size_seed:          12
 shape(seed):       32654
 seed:
Fortran runtime error: Array rank of PUT is not 1.

I.e., the seed array is a complete mess. The allocation is reported as successful but the shape is wrong, first reported as 0 and then a couple lines later as 32654 (should be 12). Any idea?

williamjameshandley commented 4 years ago

I agree that this is bizarre, although I'm now no longer convinced it's associated with the random number generator. I completely replaced the random number generator, but there are still memory issues that arise when MPI is turned on for OSX. It's more likely that this is some other underlying memory issue that only arises by chance around the time that the RNG is initialised.

tilmantroester commented 4 years ago

It could be due to the fact that mpifort uses the gcc gfortran compiler, while mpicc and mpicxx use the clang compilers on the standard homebrew openmpi installation. The mismatch between gfortran and clang could maybe cause problems. On macOS there isn't an easy way around this however, since convincing homebrew to use non-Apple compilers is a pain.

williamjameshandley commented 4 years ago

OK, I think this is almost ready to be merged. One issue that @Pablo-Lemos has encountered is that on a MAC, if mpi is not installed, then this fails. One can obviously install by passing --no-mpi to setup.py, but if we want a pip install to work, it would be much better if it could detect that no mpi compilers are present.

@tilmantroester What do you think the best way to implement this is?

tilmantroester commented 4 years ago

One option would be to ignore the mpi flag on macOS (we already check the OS anyway) and print a warning. I'll have a look at it.

williamjameshandley commented 4 years ago

Hi @tilmantroester -- what remains to be done to get this PR over the line?

tilmantroester commented 4 years ago

I merged in master, so should be able to merge now. I'd check if travis passes before merging though but travis doesn't seem to run.

williamjameshandley commented 4 years ago

Hi @tilmantroester. I think I've fixed travis now. We can check that the CI is now working by your pushing a commit which bumps the version number to 1.18.0

Whilst this makes it pip-installable with a pip install ., do you know if there are any additional steps which need to be made before it can be uploaded to pypi? (I'm happy to do these manually for now)

tilmantroester commented 4 years ago

@williamjameshandley I haven't uploaded to pypi myself. I think it's just running setup.py with sdist, adding some meta data to setup.py, and then uploading it. At some point a conda package would be nice. That might even fix the macOS MPI issue. But that can wait a bit.

williamjameshandley commented 4 years ago

Many thanks for your hard work @tilmantroester. I will refrain from creating a release now as users start to try to use this in practice, but will make a 1.18.0 once I've got confirmation that it's working on a variety of systems