bcgsc / NanoSim

Nanopore sequence read simulator
Other
217 stars 51 forks source link

memory error #92

Closed arunvv90 closed 3 years ago

arunvv90 commented 3 years ago

I was trying to run nanosim. First I have successfully made models using readanalysis.py. In the second step, I was trying to generate simulated reads simulator.py genome --ref_g ./referance_genome/atcc_ccv.fasta --model_prefix ./nanosim/atcc_nofiltermodel/ --output ./nanosim/atccccv_nofiltersimulated --basecaller guppy

running the code with following parameters:

ref_g ./referance_genome/atcc_ccv.fasta model_prefix ./nanosim/atcc_nofiltermodel/ out ./nanosim/atccccv_nofiltersimulated number 20000 perfect False kmer_bias None basecaller guppy dna_type linear strandness None sd_len None median_len None max_len inf min_len 50 num_threads 1 2020-07-19 16:44:58: /home/arun/anaconda3/envs/nanosim_test/bin/simulator.py genome --ref_g ./referance_genome/atcc_ccv.fasta --model_prefix ./nanosim/atcc_nofiltermodel/ --output ./nanosim/atccccv_nofiltersimulated --basecaller guppy 2020-07-19 16:44:58: Read in reference 2020-07-19 16:44:58: Read error profile 2020-07-19 16:44:58: Read KDF of unaligned reads 2020-07-19 16:44:58: Read KDF of aligned reads /home/arun/anaconda3/envs/nanosim_test/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp Traceback (most recent call last): File "/home/arun/anaconda3/envs/nanosim_test/bin/simulator.py", line 1513, in main() File "/home/arun/anaconda3/envs/nanosim_test/bin/simulator.py", line 1422, in main read_profile(ref_g, None, number, model_prefix, perfect, args.mode, strandness, None, False, dna_type) File "/home/arun/anaconda3/envs/nanosim_test/bin/simulator.py", line 421, in read_profile kde_ht = joblib.load(model_prefix + "_ht_length.pkl") File "/home/arun/anaconda3/envs/nanosim_test/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 598, in load obj = _unpickle(fobj, filename, mmap_mode) File "/home/arun/anaconda3/envs/nanosim_test/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 526, in _unpickle obj = unpickler.load() File "/home/arun/anaconda3/envs/nanosim_test/lib/python3.6/pickle.py", line 1050, in load dispatchkey[0] File "/home/arun/anaconda3/envs/nanosim_test/lib/python3.6/pickle.py", line 1220, in load_binbytes8 self.append(self.read(len)) File "/home/arun/anaconda3/envs/nanosim_test/lib/python3.6/pickle.py", line 238, in read return self.file_read(n) MemoryError I have 64gb ram and 6 core cpu. here is my package list in the conda enviornment

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_llvm conda-forge bedtools 2.29.2 hc088bd4_0 bioconda blas 2.17 openblas conda-forge bzip2 1.0.8 h516909a_2 conda-forge ca-certificates 2020.6.20 hecda079_0 conda-forge certifi 2020.6.20 py36h9f0ad1d_0 conda-forge cycler 0.10.0 py_2 conda-forge freetype 2.10.2 he06d7ca_0 conda-forge htseq 0.9.1 py36h7eb728f_2 bioconda icu 67.1 he1b5a44_0 conda-forge joblib 0.13.2 py36_0
jpeg 9d h516909a_0 conda-forge kiwisolver 1.2.0 py36hdb11119_0 conda-forge krb5 1.17.1 hfafb76e_1 conda-forge last 1060 h8b12597_0 bioconda lcms2 2.11 hbd6801e_0 conda-forge ld_impl_linux-64 2.34 h53a641e_7 conda-forge libblas 3.8.0 17_openblas conda-forge libcblas 3.8.0 17_openblas conda-forge libcurl 7.71.1 hcdd3856_2 conda-forge libdeflate 1.6 h516909a_0 conda-forge libedit 3.1.20191231 h46ee950_1 conda-forge libffi 3.2.1 he1b5a44_1007 conda-forge libgcc 7.2.0 h69d50b8_2 conda-forge libgcc-ng 9.2.0 h24d8f2e_2 conda-forge libgfortran-ng 7.5.0 hdf63c60_6 conda-forge liblapack 3.8.0 17_openblas conda-forge liblapacke 3.8.0 17_openblas conda-forge libopenblas 0.3.10 pthreads_hb3c22a3_2 conda-forge libpng 1.6.37 hed695b0_1 conda-forge libssh2 1.9.0 hab1572f_4 conda-forge libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge libtiff 4.1.0 hc7e4089_6 conda-forge libwebp-base 1.1.0 h516909a_3 conda-forge llvm-openmp 10.0.0 hc9558a2_0 conda-forge lz4-c 1.9.2 he1b5a44_1 conda-forge matplotlib 3.2.2 1 conda-forge matplotlib-base 3.2.2 py36h5fdd944_1 conda-forge minimap2 2.10 1 bioconda nanosim 2.5.1 py_0 bioconda ncurses 6.2 he1b5a44_1 conda-forge numpy 1.19.0 py36h7314795_0 conda-forge olefile 0.46 py_0 conda-forge openssl 1.1.1g h516909a_0 conda-forge pandas 1.0.5 py36h830a2c2_0 conda-forge parallel 20200522 0 conda-forge perl 5.26.2 h516909a_1006 conda-forge pillow 7.2.0 py36h8328e55_1 conda-forge pip 20.1.1 py_1 conda-forge pybedtools 0.8.1 py36h5202f60_2 bioconda pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pysam 0.16.0.1 py36h4c34d4e_1 bioconda python 3.6.10 h8356626_1011_cpython conda-forge python-dateutil 2.8.1 py_0 conda-forge python_abi 3.6 1_cp36m conda-forge pytz 2020.1 pyh9f0ad1d_0 conda-forge readline 8.0 he28a2e2_2 conda-forge scikit-learn 0.20.0 py36h22eb022_1
scipy 1.5.1 py36h2d22cac_0 conda-forge setuptools 49.2.0 py36h9f0ad1d_0 conda-forge six 1.15.0 pyh9f0ad1d_0 conda-forge sqlite 3.32.3 hcee41ef_1 conda-forge tk 8.6.10 hed695b0_0 conda-forge tornado 6.0.4 py36h8c4c3a4_1 conda-forge wheel 0.34.2 py_1 conda-forge xz 5.2.5 h516909a_1 conda-forge zlib 1.2.11 h516909a_1006 conda-forge zstd 1.4.5 h6597ccf_1 conda-forge

Could you please help me. thanking you in advance

SaberHQ commented 3 years ago

Hey @arunvv90 Thansk for your interest in using NanoSim. Would you please double check the size of PKL files in the profiles directory? I just want to make sure their total size is not higher than your memory capacity (which I doubt they would as they are usually smaller files).

arunvv90 commented 3 years ago

Hi Size of my pkl files are _ht_ratio.pkl =15 mb -ht_length.pkl=20.4 mb

SaberHQ commented 3 years ago

Hey @arunvv90

I see, so it is not definitely a memory issue I guess. Did you try simulating some reads using pre-trained profiles provided along with the NanoSim package? We provided some already pre-trained models for DNA and RNA sequencing. Those are several megabytes each. I just want to see if you encounter same problem with the pre-trained models or not. If yes, then I suspect that it should be an issue related to the pickle module version. If not, then perhaps you should consider running the characterizing stage one more time and see if it solves the problem.

Try these and let me know how it goes. If it is fine for you, you can also share your trained profiles with us and we can try running simulating reads and see if it works.

arunvv90 commented 3 years ago

Hi @SaberHQ I am gonna try with human genome pretrained data. Total file size more than 100mb. Please let me know your email address so that I can send it to you personally.

SaberHQ commented 3 years ago

Cool idea, yes please try the pre-trained model and let me know if it works. You may find my contact info in my profile. My work email is as follows: shafezqorani@bcgsc.ca