Closed giuliopaci closed 7 years ago
Wow thanks for this @giuliopaci I will definitely accept it. There are a couple of small things, most of which you touched on already. I'll go through it in detail the next couple days. Main things are that the _apply
and _train
scripts probably need to stick with the present names, as they are related to the kaldi
tag; but you could probably even omit them from your .deb
. I think we can also just drop openmp
. I previously played around with it a bit, but it is not actively in use in any of the builds right now, and I don't think it is really needed.
If this also solves the OSX build, then it will also close #25 which would be neat.
If it is useful for Kaldi, than I want it in the package: one of my long term goal is to have all Kaldi requirements packaged in Debian and usable for Kaldi without having to recompile them.
Do you think it will be possible to plan a name change for those scripts? It should not be difficult to create a link to those scripts with the current names during the transition. I also have similar issues with rnnlm. Do you think we can rename it? Do you think it would be possible to support --help in it?
Regarding openmp, I tried to disable it, but the build of phonetisauru-g2prnn failed. Is this expected?
Finally, I do not think compilation will work out of the box on Mac OSX, but we can try it, as Travis supports Mac OSX workers (in the worst case we will just have to compile openfst from sources, if brew install openfst is not working).
Il 19/ago/2017 21:20, "Josef Novak" notifications@github.com ha scritto:
Wow thanks for this @giuliopaci https://github.com/giuliopaci I will definitely accept it. There are a couple of small things, most of which you touched on already. I'll go through it in detail the next couple days. Main things are that the _apply and _train scripts probably need to stick with the present names, as they are related to the kaldi tag; but you could probably even omit them from your .deb. I think we can also just drop openmp. I previously played around with it a bit, but it is not actively in use in any of the builds right now, and I don't think it is really needed.
If this also solves the OSX build, then it will also close #25 https://github.com/AdolfVonKleist/Phonetisaurus/issues/25 which would be neat.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AdolfVonKleist/Phonetisaurus/pull/27#issuecomment-323542664, or mute the thread https://github.com/notifications/unsubscribe-auth/AATBq8NTbF3uCnD5stW4qHIE_PPfWGfSks5sZzWQgaJpZM4O8Xym .
Hi again, I updated the pull request with a few commits that fix compilation on Mac OSX (I also updated Travis configuration so that now it is compiled on Linux with gcc 4.9, Mac OSX with gcc and clang). Is there any reason why gcc 4.9 was used for testing and not latest available compiler?
I had to fix compilation when openmp is not enabled... But this means that now openmp is used when it is enabled. I think this is the correct behaviour, although, given what you said about openmp, probably it should be disabled. Or enabled on demand. Do you agree to disable it?
The workaround for Mac OSX flags parsing is making compilation "fail". This is because openfst flags parsing returns 1 when --help is passed to programs (0 should be the correct return value as the user explicitly requested for help). I temporarily added a workaround to Travis configuration, but it would be better to make flags parsing work on Mac OSX as well.
Cheers, Giulio
No we can move up to a later version of gcc. I do want to use at least 4.9 though, but Ubuntu 14.04 default to 4.8 I think. Ideally I'd like a wider range of supported versions, but I just don't have the time. No issue with the --help
message, and I happy to disable openmp
by default. It is a bit more complicated and poorly documented how to use the RNN stuff, and I don't really think anyone is presently using it anyway. I'll try to finish going through this all by end of night. Thanks again for the efforts.
I have gone through the commit(s) in a bit more detail now. Again it is much appreciated and introduces a large number of improvements.
Regarding the dash vs underscore naming, in principle I have no issue with making the changes you request to the scripts, but before I try to push or retag that, can you explain the reason? Is that a debian/ubuntu packaging policy requirement?
Regarding the configure script: PACKAGE_BUGREPORT='joe@spitch.ch'
, let's change that to PACKAGE_BUGREPORT='phonetisaurus@gmail.com'
. I completely forgot about that, but it is better to be consistent on that score I think, especially if it will go into a package.
Regarding the python bindings, not everyone uses these, and not everyone wants to install the python development headers. I'd prefer to keep them as a non-default option, the same way this works in the latest version of OpenFst. Right now it looks like they are required and trigger a configure error if not found.
Regarding OSX: let's forget about it / not pursue any further in this commit series. We are trying to shoehorn too many things into one commit.
Regarding kaldi integration: There is now a script for this in kaldi, install_phonetisaurus.sh. It had one remaining issue related to linking and libtool and the default kaldi OpenFst install (my approach still required the user to update LD_LIBRARY_PATH
with the OpenFst libs, but I manually verified that your improved configure scripts work with the existing setup, and handily resolve this issue. They also appear to work fine with the manual configure options and existing install script.
src/bin/
, but rather just in src/
. Unless there is a strong justification for this, it I'd prefer to keep these in src/bin
, mainly because it would mean there would be no need to make another pull request to kaldi (no breaking changes).$ git clone https://github.com/kaldi-asr/kaldi.git
$ cd kaldi/tools/
$ make openfst
$ cd extras/
$ ./install_phonetisaurus.sh
$ phonetisaurus-g2pfst --help
If we can agree on (or I can at least understand) the above, then I think we can merge it.
On 28/08/2017 16:14, "Josef Novak" notifications@github.com wrote:
Regarding the dash vs underscore naming, in principle I have no issue with making the changes you request to the scripts, but before I try to push or retag that, can you explain the reason? Is that a debian/ubuntu packaging policy requirement?
No, it is not a requirement for Debian, it is just my expectation as a user to have all the commands with exactly the same prefix and not having to remember which prefix is associated to each command (ie, it is a UX issue, not a policy violation issue). The only requirement for Debian is to avoid collisions with other packages.
Regarding the configure script: PACKAGE_BUGREPORT='joe@spitch.ch', let's change that to PACKAGE_BUGREPORT='phonetisaurus@gmail.com'. I completely forgot about that, but it is better to be consistent on that score I think, especially if it will go into a package.
Ok, I will update it.
Regarding the python bindings, not everyone uses these, and not everyone wants to install the python development headers. I'd prefer to keep them as a non-default option, the same way this works in the latest version of OpenFst. Right now it looks like they are required and trigger a configure error if not found.
If I understand correctly you want them disabled by default. Is it right? If yes I can add a --enable-python flag to configure.
Regarding OSX: let's forget about it / not pursue any further in this commit series. We are trying to shoehorn too many things into one commit.
Ok, but it is already working. :-)
Regarding kaldi integration: There is now a script for this in kaldi, install_phonetisaurus.sh. It had one remaining issue related to linking and libtool and the default kaldi OpenFst install (my approach still required the user to update LD_LIBRARY_PATH with the OpenFst libs, but I manually verified that your improved configure scripts work with the existing setup, and handily resolve this issue. They also appear to work fine with the manual configure options and existing install script.
Ok. This has to do with using libtool, which by default set rpath in binaries so that they can find the libraries.
There is one issue: the binaries are no longer build in src/bin/, but rather just in src/. Unless there is a strong justification for this, it I'd prefer to keep these in src/bin, mainly because it would mean there would be no need to make another pull request to kaldi (no breaking changes).
I am not sure that this can be changed at all (I never tried to compile programs in non-default location), but I can try. However it is not correct that kaldi relies on internal details of the building process. From kaldi, or any user, perspective, the installation interface should be ./configure && make && make install, ./configure is now able to use --prefix and other options, so that they can achieve any desired installation structure goals.
The basic sanity test for this integration under Ubuntu 16.04 is something like:
$ git clone https://github.com/kaldi-asr/kaldi.git $ cd kaldi/tools/ $ make openfst $ cd extras/ $ ./install_phonetisaurus.sh $ phonetisaurus-g2pfst --help
If we can agree on (or I can at least understand) the above, then I think we can merge it.
Ok.
Cheers, Giulio
No, it is not a requirement for Debian, it is just my expectation as a user to have all the commands with exactly the same prefix and not having to remember which prefix is associated to each command (ie, it is a UX issue, not a policy violation issue). The only requirement for Debian is to avoid collisions with other packages.
OK, that is reasonable to me. Let's just go for it.
If I understand correctly you want them disabled by default. Is it right? If yes I can add a --enable-python flag to configure.
Yeah, that would be perfect. I think it is better if it will still compile what is possible if the python-dev libs are absent.
Ok. This has to do with using libtool, which by default set rpath in binaries so that they can find the libraries.
Cool - again I was just commenting that it already works.
Ok, but it is already working. :-)
Ah, OK! Somehow I thought there was still something not quite working there. Sounds good.
I am not sure that this can be changed at all (I never tried to compile programs in non-default location), but I can try.
Ok, forget it, if those are the standard let's just keep your changes.
However it is not correct that kaldi relies on internal details of the building process.
It isn't the fault of Kaldi, it is my fault. I'm responsible for setting the extras compile script to use the bin dir by default without modifying configure.
Sounds like that resolves all my questions. I'll merge after the --enable-python
flag change and one more trip through the test setup. Thanks again!
I have started doing a rebase and implement most of the additional features.
I will force push my changes tomorrow or the day after tomorrow (I still want to do some minor cleanup).
Cheers, Giulio Il giorno 28/ago/2017 19:44, "Josef Novak" notifications@github.com ha scritto:
No, it is not a requirement for Debian, it is just my expectation as a user to have all the commands with exactly the same prefix and not having to remember which prefix is associated to each command (ie, it is a UX issue, not a policy violation issue). The only requirement for Debian is to avoid collisions with other packages.
OK, that is reasonable to me. Let's just go for it.
If I understand correctly you want them disabled by default. Is it right? If yes I can add a --enable-python flag to configure.
Yeah, that would be perfect.
Ok. This has to do with using libtool, which by default set rpath in binaries so that they can find the libraries.
Cool - again I was just commenting that it already works.
Ok, but it is already working. :-) Ah, OK! Somehow I thought there was still something not quite working there. Sounds good.
I am not sure that this can be changed at all (I never tried to compile programs in non-default location), but I can try.
Ok, forget it, if those are the standard let's just keep your changes.
However it is not correct that kaldi relies on internal details of the building process.
It isn't the fault of Kaldi, it is my fault. I'm responsible for setting the extras compile script to use the bin dir by default without modifying configure.
Sounds like that resolves all my questions. I'll merge after one more trip through the test setup. Thanks again!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AdolfVonKleist/Phonetisaurus/pull/27#issuecomment-325424321, or mute the thread https://github.com/notifications/unsubscribe-auth/AATBq1KLCLKms36R_8kzu0_Ehh7LZ64Zks5scvyKgaJpZM4O8Xym .
Hi, I completed the changes that we discussed for this pull request (and rebased on master).
In particular:
make dist-gzip
creates a proper tarball;mkdir build; cd build; ../configure; make; make install
);Commit 6c48e02 should be compatible with current script in kaldi (although I did not test it).
Cheers, Giulio
This looks great. I see one remaining issue. If I try to compile with configure --prefix=`pwd`
, then make install
fails:
libtool: install: /usr/bin/install -c bin/phonetisaurus-align \
/home/ubuntu/Phonetisaurus/bin/phonetisaurus-align
/usr/bin/install: 'bin/phonetisaurus-align' and \
'/home/ubuntu/Phonetisaurus/bin/phonetisaurus-align' are the same file
I was able to fix this by modifying Makefile.am
to change bin_PROGRAMS
to src_bin_PROGRAMS
, and updating all bin_phonetisaurus_X
to src_bin_phonetisaurus_X
, and adding src_bindir = $(prefix)/bin
.
This then builds the binaries in the src/bin
directory, where the python wrapper scripts are already sitting, and resolves the install problem with --prefix=`pwd`
. I also prefer this from a consistency standpoint.
If those changes are acceptable from your configure/build side of things either I can just push them myself, or you can make one more commit. Then we will be done.
I noticed that travis seems to not like compiling with clang as well, but I think we can ignore that as a non-issue for now; OSX build works locally for me, and travis succeeds with gcc.
What do you think?
Hi,
On 30/08/2017 10:10, "Josef Novak" notifications@github.com wrote:
This looks great. I see one remaining issue. If I try to compile with configure --prefix=
pwd
, then make install fails:libtool: install: /usr/bin/install -c bin/phonetisaurus-align \ /home/ubuntu/Phonetisaurus/bin/phonetisaurus-align /usr/bin/install: 'bin/phonetisaurus-align' and \ '/home/ubuntu/Phonetisaurus/bin/phonetisaurus-align' are the same file
This is partially expected, due to the fact that using "configure
--prefix=pwd
" you are installing in the build directory, and as you are
invoking configure from inside the source directory, you have three
different directories that may conflict (i.e., source, build and
installation directories).
If you already have to change the building procedure, I suggest to change
it drastically, but obtaining very similar results.
My suggestion is to have three separate directories.
Suppose that you still want binaries in Phonetisaurus/src/bin, I suggest
something like:
mkdir -p Phonetisaurus
cd Phonetisaurus
git checkout into sources
INSTALL_PATH=" `pwd`"/src
mkdir -p build
cd build
../sources/configure --prefix="$INSTALL_PATH" && make && make install
I was able to fix this by modifying Makefile.am to change bin_PROGRAMS to src_bin_PROGRAMS, and updating all bin_phonetisaurus_X to src_bin_phonetisaurus_X, and adding src_bindir = $(prefix)/bin.
This then builds the binaries in the src/bin directory, where the python wrapper scripts are already sitting, and resolves the install problem with --prefix=
pwd
. I also prefer this from a consistency standpoint.
This is not solving the issue, just moving it around a little bit.
From configure point of view this is not more consistent, because you are
compiling src/bin/
Ok. All agreed, and as usual thanks for the extra explanation. One last question; since we already break the kaldi script, what else would you want to change? We don't have to have the configure copy in src/
I guess.
I just removed src/configure
.
Ideally the other changes that I would like to see are:
Of these, the only problematic things for .deb packaging are rnnlm issues. This is also something that may affect kaldi as well. If kaldi is using python bindings, then the changes in setup.py may also affect kaldi.
Ok. All points are good ones, and about half are on my list, but I think we can merge this PR, and address those later since they will not further break anything.
The bindings are not used in any of the kaldi scripts as far as I know, so no issue there.
The vanilla rnnlm
trainer is basically just an old frozen version of the Mikolov tool. I'm not sure how much I want to invest into that ATM, or if the effort is better spent elsewhere.
The calculate error script needs to be refactored very heavily (rewritten), or just swapped for standard sclite
. Undecided on that.
I'm working on refactoring the documentation, and setting up the test suite, but it will probably be a while yet.
OSX builds on travis seem super backed up, but I'll merge this after they finish up.
Feel free to update setup.py
in another PR if you are so inclined.
The server bit I'd like to be expand on considerably; at the moment it is still just a stub; and I'm not totally convinced it really belongs here, as opposed to somewhere complementary.
Hi Josef, yesterday I had a look at the current status of Phonetisaurus build system and I started working on a refactor. The refactor is not yet complete, but it is already usable (with minor changes).
Here I will briefly describe what I changed and why:
As a result of this work you can now compile everything with the following:
The only drawback of the above procedure is that you will end up with two copies of Phonetisaurus.so installed, as
make install
also install it in python modules directory (but not where your module expect it to be).The reasoning behind these changes, despite the fact that they seem better to me, is that I need --prefix option and DESTDIR variable support for packaging. Moreover I will also need to add custom variables to building (needed for QA), which is now supported. And I will need to replace some flags (which is not yet supported). It would be nice to have a minimal test set. All these are much easier when using automake.
Let me know if there is any chance that this pull request will be accepted and if I have to adapt it.
Best regards, Giulio