Open JensPee opened 1 year ago
Hi, From the log it looks like the blast output file is not complete. This may be due to a lot of results and not enough RAM, even 15 GB should be enough. There is a chance that it would work if you reduce the blastseqs to a value below 500. You may try to remove the primerblast directory, change the configuration to blastseqs below 500 and try to re-run the pipeline. Another option may be to use the ref_prok_rep_genomes database, as there is way less redundancy of sequences. For the size of your current nt database it looks as it grew a lot in size in recent years and the actual size seems legitimate. Please tell me if it is working or not, I may need to change the output of the blast results from .xml to .csv/.txt as there I can select the actual data (columns) that are written to the output file, and this may reduce the required RAM. Cheers
Hi,
When I run or rerun speciesprimer (on a docker container with 15.51 gb RAM allocated to it) a single results file is not successfully created. I can not determine from the logs what the problem is. Any help would be appreciated. ( I noticed that the BLAST DB is 250 gb not 60 gb and I don't know why. Is this maybe part of the problem?) Settings are as follows: {'blastseqs': 500, 'skip_tree': False, 'minsize': 75, 'path': '/primerdesign', 'mfethreshold': 90, 'nolist': False, 'ignore_qc': False, 'maxsize': 150, 'probe': False, 'offline': False, 'nontargetlist': [...], 'assemblylevel': ['complete'], 'skip_download': False, 'target': 'Azotobacter_chroococcum', 'intermediate': False, 'qc_gene': ['rRNA'], 'exception': [], 'mpprimer': -3.5, 'blastdbv5': False, 'customdb': None, 'mfold': -3.0}
The following problem shows up in the logs: Run: run_blast - Start BLAST 27 Jun 2023 05:18:42: Run blastn -task blastn-short -num_threads 4 -query primer.part-0 -evalue 500 -out primer_0_results.xml -outfmt 5 -db nt 27 Jun 2023 14:41:00: Run blastn -task blastn-short -num_threads 4 -query primer.part-1 -evalue 500 -out primer_1_results.xml -outfmt 5 -db nt 27 Jun 2023 23:47:50: Run blastn -task blastn-short -num_threads 4 -query primer.part-2 -evalue 500 -out primer_2_results.xml -outfmt 5 -db nt 28 Jun 2023 09:20:30: Run blastn -task blastn-short -num_threads 4 -query primer.part-3 -evalue 500 -out primer_3_results.xml -outfmt 5 -db nt 28 Jun 2023 18:47:13: Run blastn - speciesprimer_2023_06_25.log task blastn-short -num_threads 4 -query primer.part-4 -evalue 500 -out primer_4_results.xml -outfmt 5 -db nt 29 Jun 2023 03:47:32: Run blastn -task blastn-short -num_threads 4 -query primer.part-5 -evalue 500 -out primer_5_results.xml -outfmt 5 -db nt 29 Jun 2023 13:16:20: Run blastn -task blastn-short -num_threads 4 -query primer.part-6 -evalue 500 -out primer_6_results.xml -outfmt 5 -db nt 29 Jun 2023 22:27:04: Run blastn -task blastn-short -num_threads 4 -query primer.part-7 -evalue 500 -out primer_7_results.xml -outfmt 5 -db nt 30 Jun 2023 07:32:29: > Blast duration: 3 days, 2:13:47 30 Jun 2023 07:32:29: Run: run_blastparser(Azotobacter_chroococcum), primer 30 Jun 2023 07:32:29: Run: blast_parser 30 Jun 2023 07:32:29: Run: blastresults_files(Azotobacter_chroococcum) 30 Jun 2023 07:32:46: > A problem with the BLAST results file /primerdesign/Azotobacter_chroococcum/Pangenome/results/primer/primerblast/primer_4_results.xml was detected. Please check if the file was removed and start the run again 30 Jun 2023 07:32:46: ['fatal error while working on', 'Azotobacter_chroococcum', 'check logfile', '/primerdesign/speciesprimer_2023_06_25.log'] fatal error while working on Azotobacter_chroococcum Traceback (most recent call last): File "/pipeline/speciesprimer.py", line 4168, in main run_pipeline_for_target(target, config) File "/pipeline/speciesprimer.py", line 4082, in run_pipeline_for_target config, primer_dict).run_primer_qc() File "/pipeline/speciesprimer.py", line 3537, in run_primer_qc self.call_blastparser.run_blastparser("primer") File "/pipeline/speciesprimer.py", line 2588, in run_blastparser align_dict = self.blast_parser(self.primerblast_dir) File "/pipeline/speciesprimer.py", line 2518, in blast_parser align_dict = self.bp_parse_xml_files(blast_dir) File "/pipeline/speciesprimer.py", line 2485, in bp_parse_xml_files blastrecords = self.parse_BLASTfile(filename) File "/pipeline/speciesprimer.py", line 2155, in parse_BLASTfile record_list = list(blast_records) File "/usr/local/lib/python3.5/dist-packages/Bio/Blast/NCBIXML.py", line 824, in parse expat_parser.Parse(NULL, True) # End of XML record xml.parsers.expat.ExpatError: no element found: line 3874641, column 0 30 Jun 2023 07:32:46: > Error report: 30 Jun 2023 07:32:46: > for target Azotobacter_chroococcum 30 Jun 2023 07:32:46: > Error 1: 30 Jun 2023 07:32:46: > A problem with the BLAST results file /primerdesign/Azotobacter_chroococcum/Pangenome/results/primer/primerblast/primer_4_results.xml was detected. Please check if the file was removed and start the run again 30 Jun 2023 07:32:46: > for target Azotobacter_chroococcum 30 Jun 2023 07:32:46: > Error 2: 30 Jun 2023 07:32:46: > fatal error while working on Azotobacter_chroococcum check logfile /primerdesign/speciesprimer_2023_06_25.log
I attached the broken file 4 and a working file 3 for comparison. Renamed to txt so github will let me upload. primer.part-4.txt primer.part-3.txt