bcgsc / NanoSim

Nanopore sequence read simulator
Other
233 stars 56 forks source link

Error encountered in the KernelDensity function #48

Closed lfaller-zymergen closed 5 years ago

lfaller-zymergen commented 5 years ago

Dear authors,

The error encountered is ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required.

I used the datasets provided by the Loman lab and an ecoli reference genome as input.

The error message prevents me from running the simulater.py script.

Any suggestions you have regarding how to resolve the issue are appreciated! Or can you make the precomputed profiles available again?

Thank you!

root@549758d01510:/# INPUT=/data/R9_Ecoli_K12_MG1655_lambda_MinKNOW_0.51.1.62.all.fasta
root@549758d01510:/# REF=/data/13002263354.fna
root@549758d01510:/# PROFILE=/data/ecoli
root@549758d01510:/#
root@549758d01510:/# read_analysis.py -i $INPUT -r $REF -o $PROFILE
2019-02-21 18:10:06: Read pre-process and unaligned reads analysis
2019-02-21 18:10:23: Alignment with minimap2
[M::mm_idx_gen::0.187*0.85] collected minimizers
[M::mm_idx_gen::0.271*0.88] sorted minimizers
[M::main::0.272*0.88] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.288*0.87] mid_occ = 12
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.304*0.86] distinct minimizers: 838544 (98.18% are singletons); average occurrences: 1.034; average spacing: 5.352
Killed
2019-02-21 18:13:00: Aligned reads analysis
Traceback (most recent call last):
  File "/NanoSim/src/read_analysis.py", line 190, in <module>
    main(sys.argv[1:])
  File "/NanoSim/src/read_analysis.py", line 161, in main
    num_aligned = align.head_align_tail(prefix, file_extension)
  File "/NanoSim/src/head_align_tail_dist.py", line 175, in head_align_tail
    kde_aligned = KernelDensity(bandwidth=10).fit(aligned_2d)
  File "/root/.local/lib/python3.7/site-packages/sklearn/neighbors/kde.py", line 128, in fit
    X = check_array(X, order='C', dtype=DTYPE)
  File "/root/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 582, in check_array
    context))
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required.
root@549758d01510:/# ls -lah /data
total 1.8G
drwxr-xr-x 7 root root  224 Feb 21 18:12 .
drwxr-xr-x 1 root root 4.0K Feb 21 18:08 ..
-rw-rw-r-- 1 root root 4.6M Feb 20 21:55 13002263354.fna
-rw-r--r-- 1 root root 901M May 25  2016 R9_Ecoli_K12_MG1655_lambda_MinKNOW_0.51.1.62.all.fasta
-rw-r--r-- 1 root root    0 Feb 21 18:10 ecoli.sam
-rw-r--r-- 1 root root    0 Feb 21 18:12 ecoli_primary.sam
-rw-r--r-- 1 root root 901M Feb 21 18:10 ecoli_processed.fasta
cheny19 commented 5 years ago

Hi,

Sorry for the late reply. Based on the log info above, it seems that the Minimap2 run is killed somehow. And the ls command also proved it because the ecoli.sam file is empty. Could you try to run Minimap2 separately, to make sure it's working?

As for the pre-computed profiles, I'm sorry that we cannot provide them for now, because our ftp site is down... We will publish some as soon as it's back or we find alternative ways.

Thanks, Chen

lfaller-zymergen commented 5 years ago

Thanks for the suggestions! I will try and debug the minimap2 run.