bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
86 stars 17 forks source link

Input reference list is misformatted #310

Open SeqBioUCC opened 2 months ago

SeqBioUCC commented 2 months ago

Versions poppunk 2.6.5

Command used and output returned poppunk_assign --db /home/quadram/Klebsiella_pneumoniae_v3_refs --query qfile.txt --output poppunk_clusters --threads 8 PopPUNK: assign (with backend: sketchlib v2.1.4 sketchlib: /home/quadram/Apps/mamba/lib/python3.10/site-packages/pp_sketchlib.cpython-310-x86_64-linux-gnu.so) Mode: Assigning clusters of query sequences

Graph-tools OpenMP parallelisation enabled: with 8 threads Input reference list is misformatted Must contain sample name and file, tab separated

Describe the bug

johnlees commented 2 months ago

Please can you post the contents (or at least the first few lines) of your qfile.txt

SeqBioUCC commented 2 months ago

H123 /home/quadram/EDITED_ASSEMBLIES/H123.fasta H13 /home/quadram/EDITED_ASSEMBLIES/H13.fasta H130 /home/quadram/EDITED_ASSEMBLIES/H130.fasta H134 /home/quadram/EDITED_ASSEMBLIES/H134.fasta H135 /home/quadram/EDITED_ASSEMBLIES/H135.fasta H137 /home/quadram/EDITED_ASSEMBLIES/H137.fasta H148 /home/quadram/EDITED_ASSEMBLIES/H148.fasta H159 /home/quadram/EDITED_ASSEMBLIES/H159.fasta H161 /home/quadram/EDITED_ASSEMBLIES/H161.fasta H166 /home/quadram/EDITED_ASSEMBLIES/H166.fasta H174 /home/quadram/EDITED_ASSEMBLIES/H174.fasta H178 /home/quadram/EDITED_ASSEMBLIES/H178.fasta

On Wed, May 1, 2024 at 9:03 AM John Lees @.***> wrote:

Please can you post the contents (or at least the first few lines) of your qfile.txt

— Reply to this email directly, view it on GitHub https://github.com/bacpop/PopPUNK/issues/310#issuecomment-2088181769, or unsubscribe https://github.com/notifications/unsubscribe-auth/BESZNGF5MOYARAILL5RUGATZACVW5AVCNFSM6AAAAABHATK2TCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBYGE4DCNZWHE . You are receiving this because you authored the thread.Message ID: @.***>

johnlees commented 2 months ago

Looks like you have spaces between name and file? They should be tabs:

Must contain sample name and file, tab separated

SeqBioUCC commented 2 months ago

actually they are tab separated in the text editor

On Sat, May 4, 2024 at 2:02 PM John Lees @.***> wrote:

Looks like you have spaces between name and file? They should be tabs:

Must contain sample name and file, tab separated

— Reply to this email directly, view it on GitHub https://github.com/bacpop/PopPUNK/issues/310#issuecomment-2094211752, or unsubscribe https://github.com/notifications/unsubscribe-auth/BESZNGG3FGGUI5PUKRXJVG3ZATS5ZAVCNFSM6AAAAABHATK2TCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUGIYTCNZVGI . You are receiving this because you authored the thread.Message ID: @.***>

johnlees commented 2 months ago

Might be the line endings then – are they Windows carriage returns?

Other things to try:

SeqBioUCC commented 2 months ago

This is the output when i run on single file

poppunk_assign --db /home/quadram/Klebsiella_pneumoniae_v3_refs --query rfile.txt --output poppunk_clusters --threads 5 PopPUNK: assign (with backend: sketchlib v2.1.4 sketchlib: /home/quadram/Apps/mamba/lib/python3.10/site-packages/ pp_sketchlib.cpython-310-x86_64-linux-gnu.so) Mode: Assigning clusters of query sequences

Graph-tools OpenMP parallelisation enabled: with 5 threads Looking for existing sketches in poppunk_clusters/poppunk_clusters.h5 Loading previously refined model Completed model loading WARNING: versions of input databases sketches are different, results may not be compatible Calculating distances using 5 thread(s) Progress (CPU): 0.0%Segmentation fault (core dumped)

On Mon, May 6, 2024 at 2:28 PM John Lees @.***> wrote:

Might be the line endings then – are they Windows carriage returns?

Other things to try:

— Reply to this email directly, view it on GitHub https://github.com/bacpop/PopPUNK/issues/310#issuecomment-2096172125, or unsubscribe https://github.com/notifications/unsubscribe-auth/BESZNGB5P3DGNTXRHFO46DLZA6HQHAVCNFSM6AAAAABHATK2TCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJWGE3TEMJSGU . You are receiving this because you authored the thread.Message ID: @.***>

johnlees commented 2 months ago

How have you installed poppunk and sketchlib here? Sketchlib's most recent release is v2.1.3, I think the install might have a problem here as you're getting a segfault.

Can you also try with two samples?

SeqBioUCC commented 2 months ago

poppunk_assign --db /home/quadram/Klebsiella_pneumoniae_v3_refs --query rfile.txt --output poppunk_clusters --threads 5 PopPUNK: assign (with backend: sketchlib v2.1.4 sketchlib: /home/quadram/Apps/mamba/lib/python3.10/site-packages/ pp_sketchlib.cpython-310-x86_64-linux-gnu.so) Mode: Assigning clusters of query sequences

Graph-tools OpenMP parallelisation enabled: with 5 threads Looking for existing sketches in poppunk_clusters/poppunk_clusters.h5 Missing sketch: Unable to open the group "/sketches/H13": (Symbol table) Object not found Sketching 2 genomes using 2 thread(s) Progress (CPU): 2 / 2 Writing sketches to file Loading previously refined model Completed model loading WARNING: versions of input databases sketches are different, results may not be compatible Calculating distances using 5 thread(s) Progress (CPU): 0.0%No non-zero Jaccard distances Fitting k-mer gradient failed, for:H123vs.EuSCAPE_HR094 Segmentation fault (core dumped)

On Tue, May 7, 2024 at 11:25 AM John Lees @.***> wrote:

How have you installed poppunk and sketchlib here? Sketchlib's most recent release is v2.1.3, I think the install might have a problem here as you're getting a segfault.

Can you also try with two samples?

— Reply to this email directly, view it on GitHub https://github.com/bacpop/PopPUNK/issues/310#issuecomment-2098179390, or unsubscribe https://github.com/notifications/unsubscribe-auth/BESZNGAPZHQMA3NV7FVO3RTZBC23PAVCNFSM6AAAAABHATK2TCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJYGE3TSMZZGA . You are receiving this because you authored the thread.Message ID: @.***>

SeqBioUCC commented 2 months ago

i followed the installation instruction on poppunk github

On Tue, May 7, 2024 at 1:30 PM Genomic Sequencing @.***> wrote:

poppunk_assign --db /home/quadram/Klebsiella_pneumoniae_v3_refs --query rfile.txt --output poppunk_clusters --threads 5 PopPUNK: assign (with backend: sketchlib v2.1.4 sketchlib: /home/quadram/Apps/mamba/lib/python3.10/site-packages/ pp_sketchlib.cpython-310-x86_64-linux-gnu.so) Mode: Assigning clusters of query sequences

Graph-tools OpenMP parallelisation enabled: with 5 threads Looking for existing sketches in poppunk_clusters/poppunk_clusters.h5 Missing sketch: Unable to open the group "/sketches/H13": (Symbol table) Object not found Sketching 2 genomes using 2 thread(s) Progress (CPU): 2 / 2 Writing sketches to file Loading previously refined model Completed model loading WARNING: versions of input databases sketches are different, results may not be compatible Calculating distances using 5 thread(s) Progress (CPU): 0.0%No non-zero Jaccard distances Fitting k-mer gradient failed, for:H123vs.EuSCAPE_HR094 Segmentation fault (core dumped)

On Tue, May 7, 2024 at 11:25 AM John Lees @.***> wrote:

How have you installed poppunk and sketchlib here? Sketchlib's most recent release is v2.1.3, I think the install might have a problem here as you're getting a segfault.

Can you also try with two samples?

— Reply to this email directly, view it on GitHub https://github.com/bacpop/PopPUNK/issues/310#issuecomment-2098179390, or unsubscribe https://github.com/notifications/unsubscribe-auth/BESZNGAPZHQMA3NV7FVO3RTZBC23PAVCNFSM6AAAAABHATK2TCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJYGE3TSMZZGA . You are receiving this because you authored the thread.Message ID: @.***>

johnlees commented 2 months ago

i followed the installation instruction on poppunk github

Sorry I need some more specifics here, was this via conda/mamba/source? We have a few different installation methods: https://poppunk.bacpop.org/installation.html

I would recommend conda

In the test you are running you are getting:

No non-zero Jaccard distances
Fitting k-mer gradient failed, for:H123vs.EuSCAPE_HR094

Are your input files (e.g. H123) definitely of the correct species?