AdolfVonKleist / Phonetisaurus

Phonetisaurus G2P
BSD 3-Clause "New" or "Revised" License
449 stars 122 forks source link

Several changes to the build system #27

Closed giuliopaci closed 7 years ago

giuliopaci commented 7 years ago

Hi Josef, yesterday I had a look at the current status of Phonetisaurus build system and I started working on a refactor. The refactor is not yet complete, but it is already usable (with minor changes).

Here I will briefly describe what I changed and why:

  1. GNUC and DARWIN where defined in Makefile.in after custom checks. The first one was used to indicate openmp availability, the latest one to indicate Mac OSX. Compilers already define macros to indicate these two capabilities, so I used the compiler defined macros instead of the custom tests. Moreover if tests are needed, they should run in configure, not in Makefile.am (or Makefile.in);
  2. I moved configure.ac and Makefile.am (was Makefile.in) outside of .autoconf directory. I do not know if there is any sane way to move them back in .autoconf directory, but in my experience autotools do not work very well when the sources are in a subtree of the parent directory with respect to the configure file. In particular it is not easy at all to make out-of-tree compilation to work (with the current setup out-of-tree compilation is working);
  3. I added better tests for openfst. The tests are now more similar to other similar tests for other libraries. They are not complete (e.g., I test only symbols in fst and not in fstfar or fstngram, but they should already work in most situations);
  4. I introduced tests for python;
  5. I replaced custom made Makefile.in with Makefile.am. This guarantee out-of-tree compilation, proper usage of compilation flags (with conditional compilation as well) and that common users expectations are met (e.g., --prefix option, DESTDIR variable and uninstall target are now working as expected);
  6. I introduces libtool. Why this is not strictly required, in my experience it simplify the building of shared libraries a lot; 7 I renamed phonetisaurus_train and phonetisaurus_apply into phonetisaurus-train and phonetisaurus-apply, according to the name of other binaries. Moreover they are now installed by the Makefile.

As a result of this work you can now compile everything with the following:

rm -fr _build
mkdir -p _build
cd _build
../src/configure
make
make install
cd ..
cp _build/.libs/Phonetisaurus.so phonetisaurus
python setup.py install

The only drawback of the above procedure is that you will end up with two copies of Phonetisaurus.so installed, as make install also install it in python modules directory (but not where your module expect it to be).

The reasoning behind these changes, despite the fact that they seem better to me, is that I need --prefix option and DESTDIR variable support for packaging. Moreover I will also need to add custom variables to building (needed for QA), which is now supported. And I will need to replace some flags (which is not yet supported). It would be nice to have a minimal test set. All these are much easier when using automake.

Let me know if there is any chance that this pull request will be accepted and if I have to adapt it.

Best regards, Giulio

AdolfVonKleist commented 7 years ago

Wow thanks for this @giuliopaci I will definitely accept it. There are a couple of small things, most of which you touched on already. I'll go through it in detail the next couple days. Main things are that the _apply and _train scripts probably need to stick with the present names, as they are related to the kaldi tag; but you could probably even omit them from your .deb. I think we can also just drop openmp. I previously played around with it a bit, but it is not actively in use in any of the builds right now, and I don't think it is really needed.

If this also solves the OSX build, then it will also close #25 which would be neat.

giuliopaci commented 7 years ago

If it is useful for Kaldi, than I want it in the package: one of my long term goal is to have all Kaldi requirements packaged in Debian and usable for Kaldi without having to recompile them.

Do you think it will be possible to plan a name change for those scripts? It should not be difficult to create a link to those scripts with the current names during the transition. I also have similar issues with rnnlm. Do you think we can rename it? Do you think it would be possible to support --help in it?

Regarding openmp, I tried to disable it, but the build of phonetisauru-g2prnn failed. Is this expected?

Finally, I do not think compilation will work out of the box on Mac OSX, but we can try it, as Travis supports Mac OSX workers (in the worst case we will just have to compile openfst from sources, if brew install openfst is not working).

Il 19/ago/2017 21:20, "Josef Novak" notifications@github.com ha scritto:

Wow thanks for this @giuliopaci https://github.com/giuliopaci I will definitely accept it. There are a couple of small things, most of which you touched on already. I'll go through it in detail the next couple days. Main things are that the _apply and _train scripts probably need to stick with the present names, as they are related to the kaldi tag; but you could probably even omit them from your .deb. I think we can also just drop openmp. I previously played around with it a bit, but it is not actively in use in any of the builds right now, and I don't think it is really needed.

If this also solves the OSX build, then it will also close #25 https://github.com/AdolfVonKleist/Phonetisaurus/issues/25 which would be neat.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AdolfVonKleist/Phonetisaurus/pull/27#issuecomment-323542664, or mute the thread https://github.com/notifications/unsubscribe-auth/AATBq8NTbF3uCnD5stW4qHIE_PPfWGfSks5sZzWQgaJpZM4O8Xym .

giuliopaci commented 7 years ago

Hi again, I updated the pull request with a few commits that fix compilation on Mac OSX (I also updated Travis configuration so that now it is compiled on Linux with gcc 4.9, Mac OSX with gcc and clang). Is there any reason why gcc 4.9 was used for testing and not latest available compiler?

I had to fix compilation when openmp is not enabled... But this means that now openmp is used when it is enabled. I think this is the correct behaviour, although, given what you said about openmp, probably it should be disabled. Or enabled on demand. Do you agree to disable it?

The workaround for Mac OSX flags parsing is making compilation "fail". This is because openfst flags parsing returns 1 when --help is passed to programs (0 should be the correct return value as the user explicitly requested for help). I temporarily added a workaround to Travis configuration, but it would be better to make flags parsing work on Mac OSX as well.

Cheers, Giulio

AdolfVonKleist commented 7 years ago

No we can move up to a later version of gcc. I do want to use at least 4.9 though, but Ubuntu 14.04 default to 4.8 I think. Ideally I'd like a wider range of supported versions, but I just don't have the time. No issue with the --help message, and I happy to disable openmp by default. It is a bit more complicated and poorly documented how to use the RNN stuff, and I don't really think anyone is presently using it anyway. I'll try to finish going through this all by end of night. Thanks again for the efforts.

AdolfVonKleist commented 7 years ago

I have gone through the commit(s) in a bit more detail now. Again it is much appreciated and introduces a large number of improvements.

giuliopaci commented 7 years ago

On 28/08/2017 16:14, "Josef Novak" notifications@github.com wrote:

Regarding the dash vs underscore naming, in principle I have no issue with making the changes you request to the scripts, but before I try to push or retag that, can you explain the reason? Is that a debian/ubuntu packaging policy requirement?

No, it is not a requirement for Debian, it is just my expectation as a user to have all the commands with exactly the same prefix and not having to remember which prefix is associated to each command (ie, it is a UX issue, not a policy violation issue). The only requirement for Debian is to avoid collisions with other packages.

Regarding the configure script: PACKAGE_BUGREPORT='joe@spitch.ch', let's change that to PACKAGE_BUGREPORT='phonetisaurus@gmail.com'. I completely forgot about that, but it is better to be consistent on that score I think, especially if it will go into a package.

Ok, I will update it.

Regarding the python bindings, not everyone uses these, and not everyone wants to install the python development headers. I'd prefer to keep them as a non-default option, the same way this works in the latest version of OpenFst. Right now it looks like they are required and trigger a configure error if not found.

If I understand correctly you want them disabled by default. Is it right? If yes I can add a --enable-python flag to configure.

Regarding OSX: let's forget about it / not pursue any further in this commit series. We are trying to shoehorn too many things into one commit.

Ok, but it is already working. :-)

Regarding kaldi integration: There is now a script for this in kaldi, install_phonetisaurus.sh. It had one remaining issue related to linking and libtool and the default kaldi OpenFst install (my approach still required the user to update LD_LIBRARY_PATH with the OpenFst libs, but I manually verified that your improved configure scripts work with the existing setup, and handily resolve this issue. They also appear to work fine with the manual configure options and existing install script.

Ok. This has to do with using libtool, which by default set rpath in binaries so that they can find the libraries.

There is one issue: the binaries are no longer build in src/bin/, but rather just in src/. Unless there is a strong justification for this, it I'd prefer to keep these in src/bin, mainly because it would mean there would be no need to make another pull request to kaldi (no breaking changes).

I am not sure that this can be changed at all (I never tried to compile programs in non-default location), but I can try. However it is not correct that kaldi relies on internal details of the building process. From kaldi, or any user, perspective, the installation interface should be ./configure && make && make install, ./configure is now able to use --prefix and other options, so that they can achieve any desired installation structure goals.

The basic sanity test for this integration under Ubuntu 16.04 is something like:

$ git clone https://github.com/kaldi-asr/kaldi.git $ cd kaldi/tools/ $ make openfst $ cd extras/ $ ./install_phonetisaurus.sh $ phonetisaurus-g2pfst --help

If we can agree on (or I can at least understand) the above, then I think we can merge it.

Ok.

Cheers, Giulio

AdolfVonKleist commented 7 years ago

No, it is not a requirement for Debian, it is just my expectation as a user to have all the commands with exactly the same prefix and not having to remember which prefix is associated to each command (ie, it is a UX issue, not a policy violation issue). The only requirement for Debian is to avoid collisions with other packages.

OK, that is reasonable to me. Let's just go for it.

If I understand correctly you want them disabled by default. Is it right? If yes I can add a --enable-python flag to configure.

Yeah, that would be perfect. I think it is better if it will still compile what is possible if the python-dev libs are absent.

Ok. This has to do with using libtool, which by default set rpath in binaries so that they can find the libraries.

Cool - again I was just commenting that it already works.

Ok, but it is already working. :-)

Ah, OK! Somehow I thought there was still something not quite working there. Sounds good.

I am not sure that this can be changed at all (I never tried to compile programs in non-default location), but I can try.

Ok, forget it, if those are the standard let's just keep your changes.

However it is not correct that kaldi relies on internal details of the building process.

It isn't the fault of Kaldi, it is my fault. I'm responsible for setting the extras compile script to use the bin dir by default without modifying configure.


Sounds like that resolves all my questions. I'll merge after the --enable-python flag change and one more trip through the test setup. Thanks again!

giuliopaci commented 7 years ago

I have started doing a rebase and implement most of the additional features.

I will force push my changes tomorrow or the day after tomorrow (I still want to do some minor cleanup).

Cheers, Giulio Il giorno 28/ago/2017 19:44, "Josef Novak" notifications@github.com ha scritto:

No, it is not a requirement for Debian, it is just my expectation as a user to have all the commands with exactly the same prefix and not having to remember which prefix is associated to each command (ie, it is a UX issue, not a policy violation issue). The only requirement for Debian is to avoid collisions with other packages.

OK, that is reasonable to me. Let's just go for it.

If I understand correctly you want them disabled by default. Is it right? If yes I can add a --enable-python flag to configure.

Yeah, that would be perfect.

Ok. This has to do with using libtool, which by default set rpath in binaries so that they can find the libraries.

Cool - again I was just commenting that it already works.

Ok, but it is already working. :-) Ah, OK! Somehow I thought there was still something not quite working there. Sounds good.

I am not sure that this can be changed at all (I never tried to compile programs in non-default location), but I can try.

Ok, forget it, if those are the standard let's just keep your changes.

However it is not correct that kaldi relies on internal details of the building process.

It isn't the fault of Kaldi, it is my fault. I'm responsible for setting the extras compile script to use the bin dir by default without modifying configure.

Sounds like that resolves all my questions. I'll merge after one more trip through the test setup. Thanks again!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AdolfVonKleist/Phonetisaurus/pull/27#issuecomment-325424321, or mute the thread https://github.com/notifications/unsubscribe-auth/AATBq1KLCLKms36R_8kzu0_Ehh7LZ64Zks5scvyKgaJpZM4O8Xym .

giuliopaci commented 7 years ago

Hi, I completed the changes that we discussed for this pull request (and rebased on master).

In particular:

Commit 6c48e02 should be compatible with current script in kaldi (although I did not test it).

Cheers, Giulio

AdolfVonKleist commented 7 years ago

This looks great. I see one remaining issue. If I try to compile with configure --prefix=`pwd`, then make install fails:

libtool: install: /usr/bin/install -c bin/phonetisaurus-align \
    /home/ubuntu/Phonetisaurus/bin/phonetisaurus-align
/usr/bin/install: 'bin/phonetisaurus-align' and \
   '/home/ubuntu/Phonetisaurus/bin/phonetisaurus-align' are the same file

I was able to fix this by modifying Makefile.am to change bin_PROGRAMS to src_bin_PROGRAMS, and updating all bin_phonetisaurus_X to src_bin_phonetisaurus_X, and adding src_bindir = $(prefix)/bin.

This then builds the binaries in the src/bin directory, where the python wrapper scripts are already sitting, and resolves the install problem with --prefix=`pwd`. I also prefer this from a consistency standpoint.

If those changes are acceptable from your configure/build side of things either I can just push them myself, or you can make one more commit. Then we will be done.

I noticed that travis seems to not like compiling with clang as well, but I think we can ignore that as a non-issue for now; OSX build works locally for me, and travis succeeds with gcc.

What do you think?

giuliopaci commented 7 years ago

Hi,

On 30/08/2017 10:10, "Josef Novak" notifications@github.com wrote:

This looks great. I see one remaining issue. If I try to compile with configure --prefix=pwd, then make install fails:

libtool: install: /usr/bin/install -c bin/phonetisaurus-align \ /home/ubuntu/Phonetisaurus/bin/phonetisaurus-align /usr/bin/install: 'bin/phonetisaurus-align' and \ '/home/ubuntu/Phonetisaurus/bin/phonetisaurus-align' are the same file

This is partially expected, due to the fact that using "configure --prefix=pwd" you are installing in the build directory, and as you are invoking configure from inside the source directory, you have three different directories that may conflict (i.e., source, build and installation directories). If you already have to change the building procedure, I suggest to change it drastically, but obtaining very similar results. My suggestion is to have three separate directories. Suppose that you still want binaries in Phonetisaurus/src/bin, I suggest something like:

mkdir -p Phonetisaurus
cd Phonetisaurus
git checkout into sources
INSTALL_PATH=" `pwd`"/src
mkdir -p build
cd build
../sources/configure --prefix="$INSTALL_PATH" && make && make install

I was able to fix this by modifying Makefile.am to change bin_PROGRAMS to src_bin_PROGRAMS, and updating all bin_phonetisaurus_X to src_bin_phonetisaurus_X, and adding src_bindir = $(prefix)/bin.

This then builds the binaries in the src/bin directory, where the python wrapper scripts are already sitting, and resolves the install problem with --prefix=pwd. I also prefer this from a consistency standpoint.

This is not solving the issue, just moving it around a little bit. From configure point of view this is not more consistent, because you are compiling src/bin/ that will be installed in , that by default is /bin. In my opinion the most sane approach is to remove both src/bin/ and bin/ prefixes (that have to be intended as

/src/bin and /bin). I added bin prefixes just to keep compatibility with the build script in kaldi, so that if you run configure from src (i.e., builddir=src), then "make" will build binaries in src/bin. But that is just a hack. > If those changes are acceptable from your configure/build side of things either I can just push them myself, or you can make one more commit. Then we will be done. In my opinion the safest approach is the one I described above. Alternatively we can move the scripts from bin to scripts, and remove the bin/ prefix in configure. In this way make install will work again. I will upload an update in a few minutes with these changes, so that you may have a look at them. > I noticed that travis seems to not like compiling with clang as well, but I think we can ignore that as a non-issue for now; OSX build works locally for me, and travis succeeds with gcc. It worked yesterday. I guess it is just a temporary network failure. Cheers, Giulio
AdolfVonKleist commented 7 years ago

Ok. All agreed, and as usual thanks for the extra explanation. One last question; since we already break the kaldi script, what else would you want to change? We don't have to have the configure copy in src/ I guess.

giuliopaci commented 7 years ago

I just removed src/configure.

Ideally the other changes that I would like to see are:

Of these, the only problematic things for .deb packaging are rnnlm issues. This is also something that may affect kaldi as well. If kaldi is using python bindings, then the changes in setup.py may also affect kaldi.

AdolfVonKleist commented 7 years ago

Ok. All points are good ones, and about half are on my list, but I think we can merge this PR, and address those later since they will not further break anything.

The bindings are not used in any of the kaldi scripts as far as I know, so no issue there.

The vanilla rnnlm trainer is basically just an old frozen version of the Mikolov tool. I'm not sure how much I want to invest into that ATM, or if the effort is better spent elsewhere.

The calculate error script needs to be refactored very heavily (rewritten), or just swapped for standard sclite. Undecided on that.

I'm working on refactoring the documentation, and setting up the test suite, but it will probably be a while yet.

OSX builds on travis seem super backed up, but I'll merge this after they finish up.

Feel free to update setup.py in another PR if you are so inclined.

The server bit I'd like to be expand on considerably; at the moment it is still just a stub; and I'm not totally convinced it really belongs here, as opposed to somewhere complementary.