MicrobeLab / DeepMicrobes

DeepMicrobes: taxonomic classification for metagenomics with deep learning
https://doi.org/10.1093/nargab/lqaa009
Apache License 2.0
81 stars 21 forks source link

problem running the example tfrec conversion on Windows Subsystem for Linux WSL and likewise on docker #14

Open ramtinz opened 2 years ago

ramtinz commented 2 years ago

Hi, I installed DeepMicrobes following the provided guide but did not have any success at running the tfrec_predict_kmer for the provided example file. Is there any solution to this? I tried it both on WSL and a docker container that I have built for DeepMicrobes, both got the same error. Have you made any docker image for DeepMicrobes? Any suggestion is greatly appreciated.

(DeepMicrobes) XYZ@PCXYZ:/mnt/c/Users/XYZ/Desktop$ tfrec_predict_kmer.sh -f SRR5935743_clean_1.fastq -r SRR5935743_clean_2.fastq -t fastq -v tokens_merged_12mers.txt -o SRR5935743 -s 4000000 -k 12

parallel successfully detected... seqtk successfully detected... Starting converting SRR5935743_clean_1.fastq and SRR5935743_clean_2.fastq to TFRecord (mode=prediction), output will be saved in SRR5935743.tfrec Parameters: kmer=12, vocab_file=tokens_merged_12mers.txt, split_size=4000000, sequence_type=fastq ====================================== 1. Interleaving R1 and R2... ====================================== 2. Splitting the merged file to 4000000 sequences per file... ====================================== 3. Converting to TFRecord... Can't use 'defined(@array)' (Maybe you should just omit the defined()?) at /mnt/c/Users/XYZ/Desktop /DeepMicrobes/bin/parallel line 119. cat: 'subset.tfrec': No such file or directory rm: cannot remove 'subset.tfrec': No such file or directory Finished.

(DeepMicrobes) XYZ@PCXYZ:/mnt/c/Users/XYZ/Desktop$ conda info

active environment : DeepMicrobes active env location : /home/XYZ/anaconda3/envs/DeepMicrobes shell level : 3 user config file : /home/XYZ/.condarc populated config files : conda version : 4.10.1 conda-build version : 3.21.4 python version : 3.8.8.final.0 virtual packages : linux=5.4.72=0 glibc=2.31=0 unix=0=0 archspec=1=x86_64 base environment : /home/XYZ/anaconda3 (writable) conda av data dir : /home/XYZ/anaconda3/etc/conda conda av metadata url : https://repo.anaconda.com/pkgs/main channel URLs : https://repo.anaconda.com/pkgs/main/linux-64 https://repo.anaconda.com/pkgs/main/noarch https://repo.anaconda.com/pkgs/r/linux-64 https://repo.anaconda.com/pkgs/r/noarch package cache : /home/XYZ/anaconda3/pkgs /home/XYZ/.conda/pkgs envs directories : /home/XYZ/anaconda3/envs /home/XYZ/.conda/envs platform : linux-64 user-agent : conda/4.10.1 requests/2.25.1 CPython/3.8.8 Linux/5.4.72-microsoft-standard-WSL2 ubuntu/20.04.2 glibc/2.31 UID:GID : 1000:1000 netrc file : None offline mode : False

MicrobeLab commented 2 years ago

Hi, It was "parallel" that did not work properly. Could you please check whether parallel has been properly installed? If not, try install parallel by yourself (instead of using the executable file in the bin directory).

(wget -O - pi.dk/3 || curl pi.dk/3/) | bash

Thanks

ramtinz commented 2 years ago

Thanks for the reply. Correct, it was related to parallel. I still get error running the same code but now a different one that seems to be related to something else. It doesn't find the vocabulary file which is in the directory (and the path)!

parallel successfully detected... seqtk successfully detected... Starting converting SRR5935743_clean_1.fastq and SRR5935743_clean_2.fastq to TFRecord (mode=prediction), output will be saved in SRR5935743.tfrec Parameters: kmer=12, vocab_file=tokens_merged_12mers.txt, split_size=4000000, sequence_type=fastq ====================================== 1. Interleaving R1 and R2... ====================================== 2. Splitting the merged file to 4000000 sequences per file... ====================================== 3. Converting to TFRecord... Academic tradition requires you to cite works you base your article on. If you use programs that use GNU Parallel to process data for an article in a scientific publication, please cite: Tange, O. (2021, September 22). GNU Parallel 20210922 ('Vindelev'). Zenodo. https://doi.org/10.5281/zenodo.5523272 This helps funding further development; AND IT WON'T COST YOU A CENT. If you pay 10000 EUR you should feel free to use GNU Parallel without citing. More about funding GNU Parallel and the citation notice: https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice To silence this citation notice: run 'parallel --citation' once. Traceback (most recent call last): File "/mnt/c/Users/XYZ/Desktop/DeepMicrobes/scripts/seq2tfrec_kmer.py", line 243, in main() File "/mnt/c/Users/XYZ/Desktop/DeepMicrobes/scripts/seq2tfrec_kmer.py", line 224, in main 'Please provide the vocabulary file.') AssertionError: Please provide the vocabulary file. cat: 'subset.tfrec': No such file or directory rm: cannot remove 'subset.tfrec': No such file or directory Finished.

MicrobeLab commented 2 years ago

Hi, Did you specify the path to the vocabulary file?

ramtinz commented 2 years ago

Hi, it worked by specifying the absolute path to the vocab file. Thanks a lot for quick reply. Do you have any plan to make a docker image of DeepMicrobes? It would be really helpful for reproducibility of the results.

MicrobeLab commented 2 years ago

The server I am working on does not have a docker installed, so that currently I do not have a plan to make a docker image. Sorry about that.

SebastianKrog commented 2 years ago

Hi, I'm working with ramtinz on getting this to work on our current server. We are having some issues getting the install.yml to create a working environment with gpu support. We havn't exactly located the error yet and it may very well be something we are doing wrong. Would it be possible for someone to provide the output from conda list and pip freeze for an environment that works?

MicrobeLab commented 2 years ago

Hi, here is the output from conda list:

Name Version Build Channel _libgcc_mutex 0.1 main conda-forge _tflow_select 2.1.0 gpu defaults absl-py 0.3.0 py_0 conda-forge astor 0.7.1 py_0 conda-forge biopython 1.70 py36_2 conda-forge blas 1.1 openblas conda-forge c-ares 1.15.0 h516909a_1001 conda-forge ca-certificates 2019.11.28 hecc5488_0 conda-forge certifi 2019.11.28 py36_0 conda-forge cudatoolkit 9.0 h13b8566_0 defaults cudnn 7.1.2 cuda9.0_0 defaults cupti 9.0.176 0 defaults gast 0.3.2 py_0 conda-forge grpcio 1.23.0 py36he9ae1f9_0 conda-forge h5py 2.7.1 py36_1 conda-forge hdf5 1.8.18 3 conda-forge libffi 3.2.1 he1b5a44_1006 conda-forge libgcc-ng 9.2.0 hdf63c60_0 conda-forge libgfortran 3.0.0 1 conda-forge libgfortran-ng 7.3.0 hdf63c60_2 conda-forge libprotobuf 3.11.1 h8b12597_0 conda-forge libstdcxx-ng 9.2.0 hdf63c60_0 conda-forge markdown 3.1.1 py_0 conda-forge ncurses 6.1 hf484d3e_1002 conda-forge numpy 1.13.3 py36_blas_openblash1522bff_1201 [blas_openblas] conda-forge openblas 0.3.3 h9ac9557_1001 conda-forge openssl 1.1.1d h516909a_0 conda-forge pip 19.3.1 py36_0 conda-forge protobuf 3.11.1 py36he1b5a44_0 conda-forge python 3.6.7 h357f687_1006 conda-forge readline 8.0 hf8c457e_0 conda-forge seqtk 1.3 hed695b0_2 bioconda setuptools 42.0.2 py36_0 conda-forge six 1.13.0 py36_0 conda-forge sqlite 3.30.1 hcee41ef_0 conda-forge tensorboard 1.9.0 py36_0 conda-forge tensorflow 1.9.0 py36_0 conda-forge tensorflow-base 1.9.0 gpu_py36h6ecc378_0 defaults tensorflow-gpu 1.9.0 hf154084_0 defaults termcolor 1.1.0 py_2 conda-forge tk 8.6.10 hed695b0_0 conda-forge werkzeug 0.16.0 py_0 conda-forge wheel 0.33.6 py36_0 conda-forge xz 5.2.4 h14c3975_1001 conda-forge zlib 1.2.11 h516909a_1006 conda-forge