bcgsc / NanoSim

Nanopore sequence read simulator
Other
217 stars 51 forks source link

Installation error #203

Closed mfbeuq closed 4 months ago

mfbeuq commented 4 months ago

Hi, thanks for the interesting tool first of all.

I did manage to install Nanosim once using the option 1 way on ubuntu 20.04. After running the training set and a first few simulations it all of a sudden stopped working. When calling the simulator I get the following error:

2024-02-22 18:07:35: Read in reference 2024-02-22 18:07:35: Read error profile 2024-02-22 18:07:35: Read KDF of unaligned reads Traceback (most recent call last): File "/home/master/NanoSim/src/simulator.py", line 2433, in main() File "/home/master/NanoSim/src/simulator.py", line 2194, in main read_profile(ref_g, number, model_prefix, perfect, args.mode, strandness, dna_type=dna_type, chimeric=chimeric) File "/home/master/NanoSim/src/simulator.py", line 516, in read_profile kde_unaligned = joblib.load(model_prefix + "_unaligned_length.pkl") File "/home/master/miniforge3/envs/nanosim/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 605, in load obj = _unpickle(fobj, filename, mmap_mode) File "/home/master/miniforge3/envs/nanosim/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 529, in _unpickle obj = unpickler.load() File "/home/master/miniforge3/envs/nanosim/lib/python3.7/pickle.py", line 1088, in load dispatchkey[0] KeyError: 0

I tried to reinstall everything using all different options, changing the dependencies and the python version, tried installing on my mac, nothing works. I keep getting this error when calling the simulator. I also tried having a separate environment with the tested dependices (mamba install scikit-learn=0.21.3 numpy=1.17.2 six samtools pysam pybedtools=0.8.2 minimap2 joblib=0.14.1 htseq=0.11.2 genometools-genometools) and the requirements.txt given versions; no luck.

Any help would be greatly appreciated! Cheers, Max

SaberHQ commented 4 months ago

Hi @mfbeuq Thanks for your interest in using our tool.

Judging by the issue you reported and that it used to work and stoped working, I think that it is related to the package requirements and might be because a newly installed package on your conda environment updated the version number for others.

Looking at the error messages related to joblib and pickle packages you posted up here, I think the issue is related to what we all discussed here: https://github.com/bcgsc/NanoSim/issues/165

Sometimes, even if you create an env with a specific Python version, the error keeps pointing to an old python inside the conda directory as also mentioned in issue #165 Please follow the tips @kmnip and others provided in there and see if they resolve your issue.

Also, take a look at the dependencies as well as installation sections for some information. I updated them with some tips to overcome the usual issues.

mfbeuq commented 4 months ago

Hi @SaberHQ and thanks for the quick reply.

Ive already noticed #165 and also the points raised there and under dependencies as well as installations. Ive tried removing all conda versions and reinstalling it every way mentioned under installation.

What helped me now was: conda create -n nanosim_env python=3.7 conda activate nanosim_env

mamba install scikit-learn=0.22.1 six samtools pysam pybedtools minimap2 joblib htseq genometools-genometools

However, now it is stuck at generating reads:

2024-02-23 18:26:50: Start simulation of aligned reads 2024-02-23 18:26:50: Number of reads simulated >> 1

and it stays there. when canceling the run it says:

2024-02-23 18:26:50: Start simulation of aligned reads ^CProcess Process-1: Number of reads simulated >> 1 Traceback (most recent call last): File "/home/master/NanoSim/src/simulator.py", line 2433, in Traceback (most recent call last): File "/home/master/miniforge3/envs/nanosim_env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/master/miniforge3/envs/nanosim_env/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/home/master/NanoSim/src/simulator.py", line 1346, in simulation_aligned_genome new_seg_list[seg_idx], read_name_components[seg_idx] = extract_read(dna_type, seg_length_list[seg_idx]) File "/home/master/NanoSim/src/simulator.py", line 1693, in extract_read new_read = seq_dict[key][ref_pos: ref_pos + length] KeyboardInterrupt main() File "/home/master/NanoSim/src/simulator.py", line 2204, in main fastq, median_len, sd_len, chimeric=chimeric) File "/home/master/NanoSim/src/simulator.py", line 1551, in simulation p.join() File "/home/master/miniforge3/envs/nanosim_env/lib/python3.7/multiprocessing/process.py", line 140, in join res = self._popen.wait(timeout) File "/home/master/miniforge3/envs/nanosim_env/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wait return self.poll(os.WNOHANG if timeout == 0.0 else 0) File "/home/master/miniforge3/envs/nanosim_env/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll pid, sts = os.waitpid(self.pid, flag) KeyboardInterrupt

Any help would be greatly appreciated! Have a nice weekend :)

mfbeuq commented 4 months ago

Update: the error persists when removing the -min 100 -max 600 -med 400 -sd 1.05 flag. It does generate some reads in the output folder (simulated_aligned_reads0.fastq and simulated_error_profile0) but it stops midway

mfbeuq commented 4 months ago

Update 2: I had to rerun the simulation with the python 3.7 environment and now it works like a charm :)