billzt / PrimerServer2

PrimerServer2: a high-throughput primer design and specificity-checking platform
GNU General Public License v3.0
77 stars 14 forks source link

deal with splitted BLAST databases #1

Closed shenweima closed 3 years ago

shenweima commented 5 years ago
primertool full 1.txt /data2/Fshare/FastaAndIndex/iwgsc_v2.0/CS_genome_v2.0 -o cSSR_full.json -t cSSR_full.tsv
/root/software/anaconda3/lib/python3.7/site-packages/ipykernel/displayhook.py:12: VisibleDeprecationWarning: zmq.eventloop.minitornado is deprecated in pyzmq 14.0 and will be removed.
    Install tornado itself to use zmq with the tornado IOLoop.

  from jupyter_client.session import extract_header, Session
Designning Primers: 10 Finished (100%)|#########################################################################################|Time:  0:00:00
Checking specificity: 305 Finished (100%)|######################################################################################|Time:  0:00:01
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/root/software/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/root/software/anaconda3/lib/python3.7/site-packages/primerserver2/core/run_blast.py", line 36, in run_blast
    raise Exception(f'The database file is not complete: file {db}.nhr is not found')
Exception: The database file is not complete: file /data2/Fshare/FastaAndIndex/iwgsc_v2.0/CS_genome_v2.0.nhr is not found
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/software/anaconda3/bin/primertool", line 10, in <module>
    sys.exit(main())
  File "/root/software/anaconda3/lib/python3.7/site-packages/primerserver2/cmd/primertool.py", line 156, in main
    run(args)
  File "/root/software/anaconda3/lib/python3.7/site-packages/primerserver2/cmd/primertool.py", line 140, in run
    report_amplicon_seq=args.report_amplicon_seqs, Tm_diff=args.Tm_diff, use_3_end=args.use_3_end)
  File "/root/software/anaconda3/lib/python3.7/site-packages/primerserver2/core/run_blast.py", line 99, in run_blast_parallel
    result_data = result.get()  # db and amplicons
  File "/root/software/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
Exception: The database file is not complete: file /data2/Fshare/FastaAndIndex/iwgsc_v2.0/CS_genome_v2.0.nhr is not found

NCBI database

CS_genome_v2.0.00.nhr
CS_genome_v2.0.00.nin
CS_genome_v2.0.00.nog
CS_genome_v2.0.00.nsd
CS_genome_v2.0.00.nsi
CS_genome_v2.0.00.nsq
CS_genome_v2.0.01.nhr
CS_genome_v2.0.01.nin
CS_genome_v2.0.01.nog
CS_genome_v2.0.01.nsd
CS_genome_v2.0.01.nsi
CS_genome_v2.0.01.nsq
CS_genome_v2.0.02.nhr
CS_genome_v2.0.02.nin
CS_genome_v2.0.02.nog
CS_genome_v2.0.02.nsd
CS_genome_v2.0.02.nsi
CS_genome_v2.0.02.nsq
CS_genome_v2.0.03.nhr
CS_genome_v2.0.03.nin
CS_genome_v2.0.03.nog
CS_genome_v2.0.03.nsd
CS_genome_v2.0.03.nsi
CS_genome_v2.0.03.nsq
CS_genome_v2.0.nal
CS_genome_v2.0

CS_genome_v2.0 is genome fasta file.

billzt commented 5 years ago

I'm sorry, currently this App is in Beta and still under development. Obviously I've found that your NCBI database is splitted into multiple sub files and I haven't dealt with this bug yet. I'll fix it in future. Currently you could try BLAST database in non-split format such as:

CS_genome_v2.0
CS_genome_v2.0.nhr
CS_genome_v2.0.nin
CS_genome_v2.0.nsq
aariani commented 5 years ago

I will be interested too in this updated. I think you can just modify the run_blast.py script indeed. In a local copy I just masked lines 39-39 of the script. Maybe the statement could be made more flexible and include also db.*.nhr ?

billzt commented 4 years ago

@aariani Thank you for your suggestion. However the main problem is that if we run blast (v2.9.0) like this: blastn -query query.fa -db CS_genome_v2.0,

it would automatically delete CS_genome_v2.0.nal and instead re-build CS_genome_v2.0.nhr, CS_genome_v2.0.nin, CS_genome_v2.0.nsq, which seems rather confusing and time-wasting!

aariani commented 4 years ago

The thing is that the wheat genome is too big and BLAST automatically creates split genome database regardless. Otherwise the search will not be efficient with a single blastdb.

The wheat genome is huge, pretty much a single chromosome is the size of a regular plant genome actually

billzt commented 4 years ago

Well, do you mean such split databases were automatically created by the makeblastdb commands? I don't have wheat genome now, and I only test very small datasets. When I run blastn, it automatically deletes the nal file and again re-build the normal database. Strange.

billzt commented 4 years ago

@shenweima @aariani This issue has been fixed now

billzt commented 3 years ago

Web UI still needs to be fixed