flashlight / text

Text utilities, including beam search decoding, tokenizing, and more, built for use in Flashlight.
MIT License
64 stars 14 forks source link

get error when cmake and make #67

Closed HalFTeen closed 1 year ago

HalFTeen commented 1 year ago

Question

the READM.md has some errors in [Building from Source] git clone https://github.com/flashlight/text && cd flashlight error 1: 'flashlight' is not exist after git clone ,actually it is 'text'. it is eary to solve. cmake .. error 2: in CMakeFiles/CMakeError.log: `Performing C SOURCE FILE Test CMAKE_HAVE_LIBC_PTHREAD failed with the following output: Change Dir: /mypath/text/build/CMakeFiles/CMakeScratch/TryCompile-UKrI7c

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_7f222/fast && /usr/bin/gmake -f CMakeFiles/cmTC_7f222.dir/build.make CMakeFiles/cmTC_7f222.dir/build gmake[1]: 进入目录“/mypath/text/build/CMakeFiles/CMakeScratch/TryCompile-UKrI7c” Building C object CMakeFiles/cmTC_7f222.dir/src.c.o /data/gcc-7.3.0/tools/bin/gcc -DCMAKE_HAVE_LIBC_PTHREAD -o CMakeFiles/cmTC_7f222.dir/src.c.o -c /mypath/text/build/CMakeFiles/CMakeScratch/TryCompile-UKrI7c/src.c Linking C executable cmTC_7f222 /home/miniconda3/lib/python3.9/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_7f222.dir/link.txt --verbose=1 /data/gcc-7.3.0/tools/bin/gcc -rdynamic CMakeFiles/cmTC_7f222.dir/src.c.o -o cmTC_7f222 CMakeFiles/cmTC_7f222.dir/src.c.o:in ‘main’: src.c:(.text+0x2d):‘pthread_create’ undefined reference src.c:(.text+0x39):‘pthread_detach’ undefined reference src.c:(.text+0x45):‘pthread_cancel’ undefined reference src.c:(.text+0x56):‘pthread_join’ undefined reference src.c:(.text+0x6a):‘pthread_atfork’ undefined reference collect2: error: ld returned 1 exit status gmake[1]: [cmTC_7f222] error 1 gmake[1]: 离开目录“/mypath/text/build/CMakeFiles/CMakeScratch/TryCompile-UKrI7c” gmake: [cmTC_7f222/fast] error 2

Source file was:

include

static void test_func(void data) { return data; }

int main(void) { pthread_t thread; pthread_create(&thread, NULL, test_func, NULL); pthread_detach(thread); pthread_cancel(thread); pthread_join(thread, NULL); pthread_atfork(NULL, NULL, NULL); pthread_exit(NULL);

return 0; }

Determining if the function pthread_create exists in the pthreads failed with the following output: Change Dir: /mypath/text/build/CMakeFiles/CMakeScratch/TryCompile-GQUvOu

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_ff2b2/fast && /usr/bin/gmake -f CMakeFiles/cmTC_ff2b2.dir/build.make CMakeFiles/cmTC_ff2b2.dir/build gmake[1]: 进入目录“/mypath/text/build/CMakeFiles/CMakeScratch/TryCompile-GQUvOu” Building C object CMakeFiles/cmTC_ff2b2.dir/CheckFunctionExists.c.o /data/gcc-7.3.0/tools/bin/gcc -DCHECK_FUNCTION_EXISTS=pthread_create -o CMakeFiles/cmTC_ff2b2.dir/CheckFunctionExists.c.o -c /mypath/text/build/CMakeFiles/CMakeScratch/TryCompile-GQUvOu/CheckFunctionExists.c Linking C executable cmTC_ff2b2 /home/miniconda3/lib/python3.9/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/cmTC_ff2b2.dir/link.txt --verbose=1 /data/gcc-7.3.0/tools/bin/gcc -DCHECK_FUNCTION_EXISTS=pthread_create -rdynamic CMakeFiles/cmTC_ff2b2.dir/CheckFunctionExists.c.o -o cmTC_ff2b2 -lpthreads /usr/bin/ld: cannot find -lpthreads collect2: error: ld returned 1 exit status gmake[1]: [cmTC_ff2b2] error 1 gmake[1]: 离开目录“/mypath/text/build/CMakeFiles/CMakeScratch/TryCompile-GQUvOu” gmake: [cmTC_ff2b2/fast] error 2`

however, makefile is done, so i continue try: make -j error 3 is occured: [ 89%] Linking CXX executable Seq2SeqDecoderTest [ 90%] Linking CXX executable TokenizerTest [ 91%] Linking CXX executable StringTest [ 92%] Linking CXX executable DecoderTest CMakeFiles/TokenizerTest.dir/tokenizer/TokenizerTest.cpp.o:in ‘std::experimental::filesystem::v1::__cxx11::path::clear()’: TokenizerTest.cpp:(.text._ZNSt12experimental10filesystem2v17__cxx114path5clearEv[_ZNSt12experimental10filesystem2v17__cxx114path5clearEv]+0x20):对‘std::experimental::filesystem::v1::__cxx11::path::_M_split_cmpts()’ undefined reference CMakeFiles/TokenizerTest.dir/tokenizer/TokenizerTest.cpp.o:in ‘std::experimental::filesystem::v1::__cxx11::path::_M_append(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)’中: TokenizerTest.cpp:(.text._ZNSt12experimental10filesystem2v17__cxx114path9_M_appendERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE[_ZNSt12experimental10filesystem2v17__cxx114path9_M_appendERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE]+0xc0):对‘std::experimental::filesystem::v1::__cxx11::path::_M_split_cmpts()’ undefined reference CMakeFiles/TokenizerTest.dir/tokenizer/TokenizerTest.cpp.o:in ‘std::experimental::filesystem::v1::__cxx11::path::path<char [9], std::experimental::filesystem::v1::__cxx11::path>(char const (&) [9])’: TokenizerTest.cpp:(.text._ZNSt12experimental10filesystem2v17__cxx114pathC2IA9_cS3_EERKT_[_ZNSt12experimental10filesystem2v17__cxx114pathC5IA9_cS3_EERKT_]+0x64):对‘std::experimental::filesystem::v1::__cxx11::path::_M_split_cmpts()’ undefined reference CMakeFiles/TokenizerTest.dir/tokenizer/TokenizerTest.cpp.o:in ‘std::experimental::filesystem::v1::__cxx11::path::path<char [66], std::experimental::filesystem::v1::__cxx11::path>(char const (&) [66])’: TokenizerTest.cpp:(.text._ZNSt12experimental10filesystem2v17__cxx114pathC2IA66_cS3_EERKT_[_ZNSt12experimental10filesystem2v17__cxx114pathC5IA66_cS3_EERKT_]+0x64):对‘std::experimental::filesystem::v1:[ 92%] Built target Seq2SeqDecoderTest :__cxx11::path::_M_split_cmpts()’ undefined reference collect2: error: ld returned 1 exit status make[2]: *** [flashlight/lib/text/test/TokenizerTest] error 1 make[1]: *** [flashlight/lib/text/test/CMakeFiles/TokenizerTest.dir/all] error 2

please help me. thanks very much.

Additional Context

[Add any additional information here] GCC=7.3.0 CMAKE=3.25.2 not install kenlm and googletest

HalFTeen commented 1 year ago

I writed my own makefile in dir: text/flashlight/lib/text/test/decoder, only for compiling the file DecoderTest.cpp alone, because i am instreasted in ctc decoder with lm. it works well. and i cout the time cost for function: decoder.decode(emission.data(), T, N); it costs 11.63s. it's too high. have you test the rate of time factor? look forward your reply. thanks.

jacobkahn commented 1 year ago

@HalFTeen — the build issue you've pasted is a problem with CMake finding a threading library on your machine; it can't link to pthread. Can you share your OS version?

With respect to decoder performance, a few questions:

  1. Did you build with optimizations flags at compile time with your Makefile?
  2. I'd try to use CMake if you can and fix the existing issue -- if you pass CMAKE_BUILD_TYPE=RelWithDebInfo (or Release), performance should improve by orders of magnitude.
  3. The CTC decoder has been extensively tested and benchmarked -- I'd check your build configuration and/or the size of your inputs.
HalFTeen commented 1 year ago

thanks to your reply. i fix the existing issues by:

  1. change boost version 1.75.0 to 1.81.0.
  2. set compiling options in cmakelist:
    link_libraries(stdc++fs)
    add_compile_options(-D_GLIBCXX_USE_CXX11_ABI=0)
    set(CMAKE_BUILD_TYPE RelWithDebInfo)

    your suggestion(CMAKE_BUILD_TYPE=RelWithDebInfo) is helpful to improve the performance from 11.63s to 2s.

  3. add the boost path before cmake: Boost_DIR=/path/to/your/Boost/ cmake ..

the file TN.bin saves the T & N, which T means frame numbers and N means syllable numbers in letters.lst, the file emission.bin saves the am scores, is it right?. but i am wonder what the file transition.bin means? Dose it seem to transition score in hmm decoder?

jacobkahn commented 1 year ago

@HalFTeen -- transitions.bin stores token level transition scores and is optional. It could represent scores in a decoder for a model trained with the AutoSegmentation Criterion (ASG) or with an HMM-style decoder, either way; it's intended to be general.

jacobkahn commented 1 year ago

@HalFTeen -- transitions.bin stores token level transition scores and is optional. It could represent scores in a decoder for a model trained with the AutoSegmentation Criterion (ASG) or with an HMM-style decoder, either way; it's intended to be general.

jacobkahn commented 1 year ago

@HalFTeen -- transitions.bin stores token level transition scores and is optional. It could represent scores in a decoder for a model trained with the AutoSegmentation Criterion (ASG) or with an HMM-style decoder, either way; it's intended to be general.