gmarcais / Jellyfish

A fast multi-threaded k-mer counter
Other
460 stars 136 forks source link

Has anyone actually compiled jellyfish with bindings? #208

Open lskatz opened 1 month ago

lskatz commented 1 month ago

I cannot get the bindings to work for the life of me but I would like to (CentOS7). And then I would like to get the recipe together in bioconda if possible.

I have tried to follow the instructions on the README and getting some advice here but it seems impossible. What I have done so far is

cd ~/bin/build
wget https://github.com/gmarcais/Jellyfish/releases/download/v2.3.1/jellyfish-2.3.1.tar.gz
tar zxvf jellyfish-2.3.1.tar.gz
cd jellyfish-2.3.1
./configure --enable-ruby-binding --enable-python-binding --enable-perl-binding
make
make install
export PKG_CONFIG_PATH=$HOME/bin/jellyfish-2.3.1/lib/pkgconfig
cd swig/python/

# Python2 produces an error /usr/bin/ld: cannot find -ljellyfish-2.0
python setup.py build > python2.log 2>&1 &

# Python3 produces the same error /usr/bin/ld: cannot find -ljellyfish-2.0
python3 setup.py build > python3.log 2>&1 &

Next, I tried autoreconf

cd ../../
autoreconf -i --force
cd -
python3 setup.py build
# => same error

The final error is

/usr/bin/ld: cannot find -ljellyfish-2.0
collect2: error: ld returned 1 exit status
error: command '/usr/local/bin/g++' failed with exit code 1

I would love any insights. I want to eventually get this into conda. If it's already in conda with bindings, I could not find it.

lskatz commented 1 month ago

Looking at conda though for completeness

$ mamba create -n jellyfish kmer-jellyfish

Looking for: ['kmer-jellyfish']

pkgs/main/linux-64 (check zst)                     Checked  0.1s
pkgs/main/noarch (check zst)                        Checked  0.0s
pkgs/r/linux-64 (check zst)                        Checked  0.0s
bioconda/linux-64                                             No change
bioconda/noarch                                               No change
pkgs/main/linux-64                                   6.3MB @  19.7MB/s  0.4s
pkgs/r/noarch                                                 No change
pkgs/main/noarch                                   714.9kB @   1.4MB/s  0.2s
pkgs/r/linux-64                                      1.6MB @   2.2MB/s  0.6s
conda-forge/noarch                                  15.6MB @  15.6MB/s  1.1s
conda-forge/linux-64                                36.3MB @  23.9MB/s  1.9s
Transaction

  Prefix: $HOME/bin/miniconda3/envs/jellyfish

  Updating specs:

   - kmer-jellyfish

  Package           Version  Build        Channel           Size
──────────────────────────────────────────────────────────────────
  Install:
──────────────────────────────────────────────────────────────────

  + _libgcc_mutex       0.1  conda_forge  conda-forge     Cached
  + libgomp          14.1.0  h77fa898_0   conda-forge     Cached
  + _openmp_mutex       4.5  2_gnu        conda-forge     Cached
  + libgcc-ng        14.1.0  h77fa898_0   conda-forge     Cached
  + libstdcxx-ng     14.1.0  hc0a3c3a_0   conda-forge     Cached
  + kmer-jellyfish    2.3.1  h4ac6f70_2   bioconda        Cached

  Summary:

  Install: 6 packages

  Total download: 0 B

──────────────────────────────────────────────────────────────────

Confirm changes: [Y/n] y

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

To activate this environment, use

     $ mamba activate jellyfish

To deactivate an active environment, use

     $ mamba deactivate

And then the test

$ conda activate jellyfish
$ python -c "import jellyfish"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "jellyfish.py", line 11, in <module>
    import dna_jellyfish as jellyfish
  File "dna_jellyfish.py", line 15, in <module>
    import _dna_jellyfish
ImportError: No module named _dna_jellyfish
$ python3 -c "import jellyfish"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/scicomp/home-pure/gzu2/bin/build/jellyfish-2.3.1/swig/python/jellyfish.py", line 11, in <module>
    import dna_jellyfish as jellyfish
  File "/scicomp/home-pure/gzu2/bin/build/jellyfish-2.3.1/swig/python/dna_jellyfish.py", line 15, in <module>
    import _dna_jellyfish
ModuleNotFoundError: No module named '_dna_jellyfish'
gmarcais commented 1 month ago

Example of installation into a particular directory (both jellyfish and the Python binding are installed in $PREFIX). This should print OK:

PREFIX=/path/to/installation
wget https://github.com/gmarcais/Jellyfish/releases/download/v2.3.1/jellyfish-2.3.1.tar.gz
tar zxf jellyfish-2.3.1.tar.gz
cd jellyfish-2.3.1
mkdir build
cd build
../configure --prefix=$PREFIX --enable-python-binding
make -j 10
make install
tree $PREFIX # Should show a $PREFIX/lib/python3.10 directory (maybe a different python version)
PYTHONPATH=$PREFIX/lib/python3.10 python3.10 -c 'import dna_jellyfish; print("OK")'

With the following configure command, jellyfish will be installed in $PREFIX but the python binding will go in the user python directory (usually something like ~/.local/lib/python...), and should be found by default (no need for PYTHONPATH):

./configure --prefix=$PREFIX --enable-python-binding=user
make -j 10 && make install
python -c 'import dna_jellyfish; print("OK")'
lskatz commented 1 month ago

I got up to this step:

$ make -j 10
make: *** No rule to make target `sub_commands/count_main_cmdline.hpp', needed by `all'.  Stop.
lskatz commented 1 month ago

I'm not sure what else to do. I can't find the Makefile target.

$ find . -iname 'count_main_cmdline*' # finds nothing
$ ls sub_commands/
bc_main.cc    count_main.cc  histo_main.cc  jellyfish.cc  merge_main.cc  stats_main.cc
cite_main.cc  dump_main.cc   info_main.cc   mem_main.cc   query_main.cc
lskatz commented 1 month ago

I tried it in the build directory and in the main directory, and I also tried with and without the --enable-swig option.

tseemann commented 1 month ago

The target error for the .hpp possibly means it is an implicit rule (or pattern rule) instead ofg an explicit one,m but given .hpp is a C++ header file, it is more likely the file itself is missing or in a different folder?

Also, just use make instead of make -j10 as the Makefile may not be written thread-safe and could fail in parallel mode.

lskatz commented 1 month ago

good idea but

$ make -j 1
make: *** No rule to make target `sub_commands/count_main_cmdline.hpp', needed by `all'.  Stop.
$ make
make: *** No rule to make target `sub_commands/count_main_cmdline.hpp', needed by `all'.  Stop.
$ cd ..
$ find . -name 'count_main*'
./sub_commands/count_main.cc
./build/sub_commands/.deps/count_main.Po
lskatz commented 1 month ago

Hi, I wanted to revisit this in case you had any ideas? Thank you for all your help!