bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
94 stars 20 forks source link

Row 130 has more columns than stated size....Could not run command rapidnj #336

Closed drhoads closed 3 weeks ago

drhoads commented 3 weeks ago

Versions

poppunk 2.6.0 pp-sketchlib 2.1.0 Command used and output returned

Running poppunk in a conda environment in WSL2-Ubuntu 1206 genomes from one bacterial species (had already run on 1188 of these genomes but added some reference genomes bash file: poppunk --create-db --output Poppunk --r-files r-list.txt --external-clustering external_clusters.csv --overwrite --threads 8 poppunk --fit-model lineage --ref-db Poppunk --output PopPunk --graph-weights --overwrite --threads 8 poppunk_visualise --ref-db Poppunk --model-dir PopPunk --output Poppunk --overwrite --microreact --threads 8

Run output PopPUNK (POPulation Partitioning Using Nucleotide Kmers) (with backend: sketchlib v2.1.0 sketchlib: /home/drhoads/miniconda3/envs/Poppunk/lib/python3.10/site-packages/pp_sketchlib.cpython-310-x86_64-linux-gnu.so)

Graph-tools OpenMP parallelisation enabled: with 8 threads Mode: Building new database from input sequences Overwriting db: Poppunk/Poppunk.h5 Sketching 1206 genomes using 8 thread(s) Progress (CPU): 1206 / 1206 Writing sketches to file Calculating random match chances using Monte Carlo Calculating distances using 8 thread(s) Progress (CPU): 100.0%

Done PopPUNK (POPulation Partitioning Using Nucleotide Kmers) (with backend: sketchlib v2.1.0 sketchlib: /home/drhoads/miniconda3/envs/Poppunk/lib/python3.10/site-packages/pp_sketchlib.cpython-310-x86_64-linux-gnu.so)

Graph-tools OpenMP parallelisation enabled: with 8 threads Mode: Fitting lineage model to reference database

Network for rank 1 has 257 lineages Network for rank 2 has 100 lineages Network for rank 3 has 55 lineages Parsed data, now writing to CSV

Done

Graph-tools OpenMP parallelisation enabled: with 8 threads PopPUNK: visualise Loading lineage cluster model Completed model loading Building phylogeny Row 130 has more columns than the stated size of 1206 Could not run command rapidnj Poppunk/Poppunk_core_distances.phylip -n -i pd -o t -x Poppunk/Poppunk_core_NJ.nwk.raw -c 8; returned code: 1

Describe the bug

Tried several times and made certain there were no empty lines in the r-list.txt input file. r-list.txt

johnlees commented 3 weeks ago

This might be rapidnj parsing the names poorly. Is the temp file Poppunk/Poppunk_core_distances.phylip still there? Can you attach it here? (may have to change the extension to .txt)

drhoads commented 3 weeks ago

I think I found the probable error when I started working on using ParSNP on the same dataset. I made the usual mistake of not checking the strain names and one of the newly added genomes was ATCC 4200. I had "cleaned" all the strain names in the original set of 1188 genomes but I was adding some of the known ST strains. Sorry for the mistake.

@.***

From: John Lees @.> Sent: Monday, November 4, 2024 4:59 AM To: bacpop/PopPUNK @.> Cc: Douglas Duane Rhoads @.>; Author @.> Subject: Re: [bacpop/PopPUNK] Row 130 has more columns than stated size....Could not run command rapidnj (Issue #336)

This might be rapidnj parsing the names poorly. Is the temp file Poppunk/Poppunk_core_distances.phylip still there? Can you attach it here? (may have to change the extension to .txt)

- Reply to this email directly, view it on GitHubhttps://github.com/bacpop/PopPUNK/issues/336#issuecomment-2454407956, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIX22VVGIYZSA7LYFNCYCZLZ65HRDAVCNFSM6AAAAABRD7JAHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJUGQYDOOJVGY. You are receiving this because you authored the thread.Message ID: @.**@.>>

johnlees commented 3 weeks ago

No problem, that was an easy fix then!