DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
184 stars 44 forks source link

IndexError: list index out of range when parsing tax0. Blobtools normally works, not sure how to troubleshoot this error #129

Closed margaretc-ho closed 11 months ago

margaretc-ho commented 11 months ago

Hi @DRL

Thanks for making this software, its been very helpful to us in assembling our protist genome and isolating it from bacteria in its environment. My blobtools installation normally works very well. However, I am trying to run blobtools on a new metagenomic assembly and I am getting the following error:

[+] Parsing tax0 - /gpfs/gsfs8/users/homc/Sep252023_Tcas_gIllumina_unmapped_spades_blobtools_filtered/Tcas_gIllumina_unmapped_spades_Sep2023.vs.nt.cul5.np.dcmegablast.PBunmappedspades_parts2023-09-14_allcat.txt Traceback (most recent call last): File "/data/homc/conda/envs/blobtools/blobtools", line 7, in <module> main() File "/gpfs/gsfs8/users/homc/conda/envs/blobtools/lib/interface.py", line 60, in main create.main() File "/gpfs/gsfs8/users/homc/conda/envs/blobtools/lib/create.py", line 108, in main blobDb.parseHits(hit_libs) File "/gpfs/gsfs8/users/homc/conda/envs/blobtools/lib/BtCore.py", line 420, in parseHits for hitDict in BtIO.readTax(hitLib.f, set(self.dict_of_blobs)): File "/gpfs/gsfs8/users/homc/conda/envs/blobtools/lib/BtIO.py", line 527, in readTax 'name' : col[0], IndexError: list index out of range

The assembly had many contigs (127241 contigs) and I ran the megblast on the fasta by splitting the fasta up into 100 sections and cat the output together to feed into blobtools. This megablast output (222327 lines) looks normal and as expected (checked throughout, head and tail)

The first few lines looks like this: NODE_1_length_384100_cov_11.257694 1384484 11890 NODE_1_length_384100_cov_11.257694 AP013105.1 78.158 14614 3044 46 6865 21456 463974 478461 0.0 Adlercreutzia equolifaciens DSM 19450 Bacteria Adlercreutzia equolifaciens DSM 19450 DNA, complete genome NODE_1_length_384100_cov_11.257694 1384484 3076 NODE_1_length_384100_cov_11.257694 AP013105.1 78.462 3719 748 15 191905 195608 1567168 1563488 0.0 Adlercreutzia equolifaciens DSM 19450 Bacteria Adlercreutzia equolifaciens DSM 19450 DNA, complete genome NODE_1_length_384100_cov_11.257694 1384484 2551 NODE_1_length_384100_cov_11.257694 AP013105.1 69.916 6309 1593 121 305420 311549 1339715 1345897 0.0 Adlercreutzia equolifaciens DSM 19450 Bacteria Adlercreutzia equolifaciens DSM 19450 DNA, complete genome NODE_1_length_384100_cov_11.257694 1384484 2454 NODE_1_length_384100_cov_11.257694 AP013105.1 75.725 3621 800 41 207156 210724 1694305 1690712 0.0 Adlercreutzia equolifaciens DSM 19450 Bacteria Adlercreutzia equolifaciens DSM 19450 DNA, complete genome NODE_1_length_384100_cov_11.257694 1384484 2287 NODE_1_length_384100_cov_11.257694 AP013105.1 78.062 2858 613 12 365496 368343 1416110 1418963 0.0 Adlercreutzia equolifaciens DSM 19450 Bacteria Adlercreutzia equolifaciens DSM 19450 DNA, complete genome NODE_1_length_384100_cov_11.257694 1384484 2273 NODE_1_length_384100_cov_11.257694 AP013105.1 74.387 3631 822 40 293965 297544 1327934 1331507 0.0 Adlercreutzia equolifaciens DSM 19450 Bacteria Adlercreutzia equolifaciens DSM 19450 DNA, complete genome

Do you have an idea why am I getting the above "IndexError: list index out of range" from blobtools? Any information you can give would be very helpful. Thank you very much.

margaretc-ho commented 11 months ago

Never mind, we solved it! The first line of the megablast was blank and removing it allowed blobtools to run on the tax0/megablast output without issues. Posting it here in case that helps others.