DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
192 stars 44 forks source link

no .json file produced by blobtools create #97

Closed peterthorpe5 closed 4 years ago

peterthorpe5 commented 4 years ago

Hi Dom,

I have been using blobtools for a number of years now. I am assembly 4 fish genomes, Illumina only. So hundreds 000s contigs. ~1Gbp genomes.

blobtools v1.0

blast cmd

blastn -task megablast -query scaffolds.fasta -db nt -outfmt '6 qseqid staxids bitscore std scomnames sscinames sblastnames sskingdoms stitle' -evalue 1e-20 -out n.clc.allfinal.out - num_threads 16

create cmd

blobtools create -i scaffolds.fasta -s xr_scaff.sam -t n.clc.allfinal.out -o xr_V1.blobplots

This runs, for ~1.5 hours, then just stops:

blobtools create -i scaffolds.fasta -s xc.sam -t n.clc.allfinal.out -o test [+] Parsing FASTA - scaffolds.fasta [+] names.dmp/nodes.dmp not specified. Retrieving nodesDB from /conda/envs/python27/opt/blobtools-1.0.1/data/nodesDB.txt [%] 100% [+] Parsing tax0 - /storage/fish_genomes/xc/n.clc.allfinal.out

Then nothing. The node this is running on has 500GB RAM, it isnt running out of RAM.

head of files:

head n.clc.allfinal.out (Seq, taxid, bit score .. the rest... ) NODE_1_length_118370_cov_12.234714 32473 1284 NODE_1_length_118370_cov_12.234714 XM_028031624.1 79.617 1933 282 NODE_1_length_118370_cov_12.234714 8083 1282

head *.sam @SQ SN:NODE_1_length_210002_cov_18.361000 LN:210002 @SQ SN:NODE_2_length_168846_cov_18.635259 LN:168846 @SQ SN:NODE_3_length_144837_cov_19.065304 LN:144837

Can you please share some wisdom on how to solve this?

peterthorpe5 commented 4 years ago

it was a resource limitation problem ... split it into 100 fatsa files and it worked.

cheers