bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
89 stars 18 forks source link

Distances failed quality control (change QC options to run anyway) #105

Closed furqan915 closed 3 years ago

furqan915 commented 3 years ago

Hi, I am using POPUNK to study 27 genomes of a bacteria downloaded from NCBI. But whenever I run this command,

root@hon-pc:/home/fuan/monas/fna_combine# poppunk --easy-run --r-files reference_list.txt --output lm_example --threads 4 --plot-fit 5 --min-k 17 --k-step 2 --max-a-dist 0.8 --full-db

I have already tried the default --mink 13 and --kstep 4.

I always get this error.

`
PopPUNK (POPulation Partitioning Using Nucleotide Kmers)
    (with backend: sketchlib v1.5.1
     sketchlib: /root/miniconda3/lib/python3.8/site-packages/pp_sketchlib.cpython-38-x86_64-linux-gnu.so)
Mode: Creating clusters from assemblies (create_db & fit_model)
Worst random match probability at 17-mers: 0.00
Looking for existing sketches in lm_example/lm_example.h5
Calculating distances using 0 thread(s)
WARNING: Accessory outlier at a=0.99409056 1:WCX23-2 2:23-C-23
WARNING: Accessory outlier at a=0.9979968 1:ML09-119 2:AL09-71
WARNING: Accessory outlier at a=0.9989984 1:pc104A 2:AL09-71
WARNING: Accessory outlier at a=0.8863181 1:GYK1 2:D4
WARNING: Accessory outlier at a=0.9170673 1:J-1 2:D4
WARNING: Accessory outlier at a=0.9593349 1:JBN2301 2:D4
WARNING: Accessory outlier at a=0.96744794 1:LHW39 2:D4
WARNING: Accessory outlier at a=0.9156651 1:NJ-35 2:D4
WARNING: Accessory outlier at a=0.94290864 1:ZYAH72 2:D4
WARNING: Accessory outlier at a=1.0 1:WP8-S18-ESBL-02 2:GSH8-2
WARNING: Accessory outlier at a=0.95022035 1:J-1 2:GYK1
WARNING: Accessory outlier at a=0.9073518 1:JBN2301 2:GYK1
WARNING: Accessory outlier at a=0.9150641 1:LHW39 2:GYK1
WARNING: Accessory outlier at a=0.91546476 1:NJ-35 2:GYK1
WARNING: Accessory outlier at a=0.9167668 1:ZYAH72 2:GYK1
WARNING: Accessory outlier at a=0.9387019 1:JBN2301 2:J-1
WARNING: Accessory outlier at a=0.9464143 1:LHW39 2:J-1
WARNING: Accessory outlier at a=0.9474159 1:NJ-35 2:J-1
WARNING: Accessory outlier at a=0.94471157 1:ZYAH72 2:J-1
WARNING: Accessory outlier at a=0.98577726 1:LHW39 2:JBN2301
WARNING: Accessory outlier at a=0.9077524 1:NJ-35 2:JBN2301
WARNING: Accessory outlier at a=0.97065306 1:ZYAH72 2:JBN2301
WARNING: Accessory outlier at a=0.91466343 1:NJ-35 2:LHW39
WARNING: Accessory outlier at a=0.9740585 1:ZYAH72 2:LHW39
WARNING: Accessory outlier at a=0.9979968 1:pc104A 2:ML09-119
WARNING: Accessory outlier at a=0.91276044 1:ZYAH72 2:NJ-35
WARNING: Accessory outlier at a=0.99409056 1:WCX23-2 2:23-C-23
WARNING: Accessory outlier at a=0.9979968 1:ML09-119 2:AL09-71
WARNING: Accessory outlier at a=0.9989984 1:pc104A 2:AL09-71
WARNING: Accessory outlier at a=0.8863181 1:GYK1 2:D4
WARNING: Accessory outlier at a=0.9170673 1:J-1 2:D4
WARNING: Accessory outlier at a=0.9593349 1:JBN2301 2:D4
WARNING: Accessory outlier at a=0.96744794 1:LHW39 2:D4
WARNING: Accessory outlier at a=0.9156651 1:NJ-35 2:D4
WARNING: Accessory outlier at a=0.94290864 1:ZYAH72 2:D4
WARNING: Accessory outlier at a=1.0 1:WP8-S18-ESBL-02 2:GSH8-2
WARNING: Accessory outlier at a=0.95022035 1:J-1 2:GYK1
WARNING: Accessory outlier at a=0.9073518 1:JBN2301 2:GYK1
WARNING: Accessory outlier at a=0.9150641 1:LHW39 2:GYK1
WARNING: Accessory outlier at a=0.91546476 1:NJ-35 2:GYK1
WARNING: Accessory outlier at a=0.9167668 1:ZYAH72 2:GYK1
WARNING: Accessory outlier at a=0.9387019 1:JBN2301 2:J-1
WARNING: Accessory outlier at a=0.9464143 1:LHW39 2:J-1
WARNING: Accessory outlier at a=0.9474159 1:NJ-35 2:J-1
WARNING: Accessory outlier at a=0.94471157 1:ZYAH72 2:J-1
WARNING: Accessory outlier at a=0.98577726 1:LHW39 2:JBN2301
WARNING: Accessory outlier at a=0.9077524 1:NJ-35 2:JBN2301
WARNING: Accessory outlier at a=0.97065306 1:ZYAH72 2:JBN2301
WARNING: Accessory outlier at a=0.91466343 1:NJ-35 2:LHW39
WARNING: Accessory outlier at a=0.9740585 1:ZYAH72 2:LHW39
WARNING: Accessory outlier at a=0.9979968 1:pc104A 2:ML09-119
WARNING: Accessory outlier at a=0.91276044 1:ZYAH72 2:NJ-35
Distances failed quality control (change QC options to run anyway)
`

Please help me with this issue.

SilasK commented 3 years ago

As I understand the output you have outliers with a>=0.9. I set --max-a-dist 1 and it worked.

johnlees commented 3 years ago

Are these bacteria from different species? That may be why you are getting such high accessory distances, which can make later parts of the clustering process fail.

If you want to proceed anyway, you can do as @SilasK says and change --max-a-dist to 1, or --qc-filter continue to ignore all errors.

SilasK commented 3 years ago

For understanding: a=0.9 means that 90% of the genome is classified as an accessory and the core represents only 10%?

SilasK commented 3 years ago

I tried with --qc-filter continue only and it raises the same error. now with v2.2

johnlees commented 3 years ago

My mistake, qc-filter only looks at the individual genomes and their sketches, not the distances.

a = 0.9 means that 90% of the accessory sequence (changes larger than the smallest k-mer size) is different, but it doesn't tell you about the proportion of core to accessory. Decreasing the lowest k-mer size can therefore give extra resolution on the accessory distance (we still need to update the docs on k-mer length choice, sorry this isn't all out there).

furqan915 commented 3 years ago

Hi, Thanks for such quick response. No. these genomes are strains of the same species. I downloaded these complete genomes from NCBI to compare the various parameters.

I have tried --max-a-dist 1 but it could not resolve the issue. Any other solution you can suggest?

johnlees commented 3 years ago

What output did you get when using --max-a-dist 1? Can you post the full command you used too?

johnlees commented 3 years ago

Closing as --max-a-dist 1 should fix this, but please reopen (with error message) if problems still arise