KarelVesely84 / kaldi-io-for-python

Python functions for reading kaldi data formats. Useful for rapid prototyping with python.
Apache License 2.0
376 stars 119 forks source link

Modifications to $PATH if $KALDI_ROOT is not set #39

Closed qmeeus closed 1 year ago

qmeeus commented 4 years ago

Hi,

Are the modifications to the PATH variable in lines 17-24 really necessary?

If yes, I would suggest replacing the modifications with an exception if $KALDI_ROOT is not set, and if they are not necessary for the script, I would suggest to remove them completely!

Thanks for all the work, Best, Quentin

KarelVesely84 commented 4 years ago

Hi Quentin, yes, getting kaldi tools to $PATH is necessary, so that they can be used in the 'pipeline filenames': kaldi_io.read_mat('sum-matrix ... | scale-matrix ... - |')

And, yes, i could eventually put it into a try block, and show an annoying warning if Kaldi is not found... So you don't use kaldi at all, you are just importing the its data formats in another project? Am i correct? K.

KarelVesely84 commented 4 years ago

In what way was it failing to you? Do you use Linux, or other OS? K.

KarelVesely84 commented 4 years ago

I modified it a bit, please let me know if it solves you problem: https://github.com/vesis84/kaldi-io-for-python/commit/d6f9d9f13baac5384b9f7438b8e38d1039fcbc0f

qmeeus commented 4 years ago

So you don't use kaldi at all, you are just importing the its data formats in another project? Am i correct?

I have already computed all the features I needed with Kaldi scripts, I am indeed just using your code for loading the generated files

In what way was it failing to you? Do you use Linux, or other OS?

This section in particular is problematic:

if not 'KALDI_ROOT' in os.environ:
    # Default! To change run python with 'export KALDI_ROOT=/some_dir python'
    os.environ['KALDI_ROOT']='/mnt/matylda5/iveselyk/Tools/kaldi-trunk'

I had some problems with the encodings and was trying to traceback what was going wrong. At some point, I echo'ed my environment and saw a lot of weird things had been added to my $PATH, including a folder that did not exist in my filesystem... Turns out the code above was the reason.

I think that instead of adding this to the environment variable, which will only work for you, you should raise an error. In fact, setting $KALDI_ROOT and having Kaldi bins in the $PATH is not always necessary to use most functions in your script, so it's probably best to check and raise when Kaldi is necessary and ignore otherwise.

KarelVesely84 commented 4 years ago

i don't understand, how extending $PATH can cause problems with encodings... even if there is a path added that does not exist, it should not cause any failures IMHO...

and having the $PATH extended only sometimes would not be user friendly... having kaldi binaries on $PATH is an important and very handy/powerful feature...

qmeeus commented 4 years ago

It does not cause problems with the encoding

erip commented 2 years ago

I'll add that it's a bit weird to include a hardcoded path that will almost certainly not exist for anyone else. If $KALDI_ROOT is really required but it's not set, it's better to throw an error than to silently set it to a path that won't necessarily exist.

KarelVesely84 commented 1 year ago

Well, it depends. For some use-cases running kaldi binaries in pipe commands is not necessary. In this case setting $KALDI_ROOT is not necessary. Only in the "advanced" cases with pipes it becomes necessary. So both "modes" make sense, depending on the circumstances... K.