Closed arunvv90 closed 3 years ago
Hey @arunvv90 Thansk for your interest in using NanoSim. Would you please double check the size of PKL files in the profiles directory? I just want to make sure their total size is not higher than your memory capacity (which I doubt they would as they are usually smaller files).
Hi Size of my pkl files are _ht_ratio.pkl =15 mb -ht_length.pkl=20.4 mb
Hey @arunvv90
I see, so it is not definitely a memory issue I guess. Did you try simulating some reads using pre-trained profiles provided along with the NanoSim package? We provided some already pre-trained models for DNA and RNA sequencing. Those are several megabytes each. I just want to see if you encounter same problem with the pre-trained models or not. If yes, then I suspect that it should be an issue related to the pickle module version. If not, then perhaps you should consider running the characterizing stage one more time and see if it solves the problem.
Try these and let me know how it goes. If it is fine for you, you can also share your trained profiles with us and we can try running simulating reads and see if it works.
Hi @SaberHQ I am gonna try with human genome pretrained data. Total file size more than 100mb. Please let me know your email address so that I can send it to you personally.
Cool idea, yes please try the pre-trained model and let me know if it works. You may find my contact info in my profile. My work email is as follows: shafezqorani@bcgsc.ca
I was trying to run nanosim. First I have successfully made models using readanalysis.py. In the second step, I was trying to generate simulated reads simulator.py genome --ref_g ./referance_genome/atcc_ccv.fasta --model_prefix ./nanosim/atcc_nofiltermodel/ --output ./nanosim/atccccv_nofiltersimulated --basecaller guppy
running the code with following parameters:
ref_g ./referance_genome/atcc_ccv.fasta model_prefix ./nanosim/atcc_nofiltermodel/ out ./nanosim/atccccv_nofiltersimulated number 20000 perfect False kmer_bias None basecaller guppy dna_type linear strandness None sd_len None median_len None max_len inf min_len 50 num_threads 1 2020-07-19 16:44:58: /home/arun/anaconda3/envs/nanosim_test/bin/simulator.py genome --ref_g ./referance_genome/atcc_ccv.fasta --model_prefix ./nanosim/atcc_nofiltermodel/ --output ./nanosim/atccccv_nofiltersimulated --basecaller guppy 2020-07-19 16:44:58: Read in reference 2020-07-19 16:44:58: Read error profile 2020-07-19 16:44:58: Read KDF of unaligned reads 2020-07-19 16:44:58: Read KDF of aligned reads /home/arun/anaconda3/envs/nanosim_test/lib/python3.6/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp Traceback (most recent call last): File "/home/arun/anaconda3/envs/nanosim_test/bin/simulator.py", line 1513, in
main()
File "/home/arun/anaconda3/envs/nanosim_test/bin/simulator.py", line 1422, in main
read_profile(ref_g, None, number, model_prefix, perfect, args.mode, strandness, None, False, dna_type)
File "/home/arun/anaconda3/envs/nanosim_test/bin/simulator.py", line 421, in read_profile
kde_ht = joblib.load(model_prefix + "_ht_length.pkl")
File "/home/arun/anaconda3/envs/nanosim_test/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 598, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/arun/anaconda3/envs/nanosim_test/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 526, in _unpickle
obj = unpickler.load()
File "/home/arun/anaconda3/envs/nanosim_test/lib/python3.6/pickle.py", line 1050, in load
dispatchkey[0]
File "/home/arun/anaconda3/envs/nanosim_test/lib/python3.6/pickle.py", line 1220, in load_binbytes8
self.append(self.read(len))
File "/home/arun/anaconda3/envs/nanosim_test/lib/python3.6/pickle.py", line 238, in read
return self.file_read(n)
MemoryError
I have 64gb ram and 6 core cpu.
here is my package list in the conda enviornment
Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_llvm conda-forge bedtools 2.29.2 hc088bd4_0 bioconda blas 2.17 openblas conda-forge bzip2 1.0.8 h516909a_2 conda-forge ca-certificates 2020.6.20 hecda079_0 conda-forge certifi 2020.6.20 py36h9f0ad1d_0 conda-forge cycler 0.10.0 py_2 conda-forge freetype 2.10.2 he06d7ca_0 conda-forge htseq 0.9.1 py36h7eb728f_2 bioconda icu 67.1 he1b5a44_0 conda-forge joblib 0.13.2 py36_0
jpeg 9d h516909a_0 conda-forge kiwisolver 1.2.0 py36hdb11119_0 conda-forge krb5 1.17.1 hfafb76e_1 conda-forge last 1060 h8b12597_0 bioconda lcms2 2.11 hbd6801e_0 conda-forge ld_impl_linux-64 2.34 h53a641e_7 conda-forge libblas 3.8.0 17_openblas conda-forge libcblas 3.8.0 17_openblas conda-forge libcurl 7.71.1 hcdd3856_2 conda-forge libdeflate 1.6 h516909a_0 conda-forge libedit 3.1.20191231 h46ee950_1 conda-forge libffi 3.2.1 he1b5a44_1007 conda-forge libgcc 7.2.0 h69d50b8_2 conda-forge libgcc-ng 9.2.0 h24d8f2e_2 conda-forge libgfortran-ng 7.5.0 hdf63c60_6 conda-forge liblapack 3.8.0 17_openblas conda-forge liblapacke 3.8.0 17_openblas conda-forge libopenblas 0.3.10 pthreads_hb3c22a3_2 conda-forge libpng 1.6.37 hed695b0_1 conda-forge libssh2 1.9.0 hab1572f_4 conda-forge libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge libtiff 4.1.0 hc7e4089_6 conda-forge libwebp-base 1.1.0 h516909a_3 conda-forge llvm-openmp 10.0.0 hc9558a2_0 conda-forge lz4-c 1.9.2 he1b5a44_1 conda-forge matplotlib 3.2.2 1 conda-forge matplotlib-base 3.2.2 py36h5fdd944_1 conda-forge minimap2 2.10 1 bioconda nanosim 2.5.1 py_0 bioconda ncurses 6.2 he1b5a44_1 conda-forge numpy 1.19.0 py36h7314795_0 conda-forge olefile 0.46 py_0 conda-forge openssl 1.1.1g h516909a_0 conda-forge pandas 1.0.5 py36h830a2c2_0 conda-forge parallel 20200522 0 conda-forge perl 5.26.2 h516909a_1006 conda-forge pillow 7.2.0 py36h8328e55_1 conda-forge pip 20.1.1 py_1 conda-forge pybedtools 0.8.1 py36h5202f60_2 bioconda pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pysam 0.16.0.1 py36h4c34d4e_1 bioconda python 3.6.10 h8356626_1011_cpython conda-forge python-dateutil 2.8.1 py_0 conda-forge python_abi 3.6 1_cp36m conda-forge pytz 2020.1 pyh9f0ad1d_0 conda-forge readline 8.0 he28a2e2_2 conda-forge scikit-learn 0.20.0 py36h22eb022_1
scipy 1.5.1 py36h2d22cac_0 conda-forge setuptools 49.2.0 py36h9f0ad1d_0 conda-forge six 1.15.0 pyh9f0ad1d_0 conda-forge sqlite 3.32.3 hcee41ef_1 conda-forge tk 8.6.10 hed695b0_0 conda-forge tornado 6.0.4 py36h8c4c3a4_1 conda-forge wheel 0.34.2 py_1 conda-forge xz 5.2.5 h516909a_1 conda-forge zlib 1.2.11 h516909a_1006 conda-forge zstd 1.4.5 h6597ccf_1 conda-forge
Could you please help me. thanking you in advance