caseywdunn / sharkmer

3 stars 0 forks source link

getting two product 0 #12

Open caseywdunn opened 4 months ago

caseywdunn commented 4 months ago

Get the following:


>Physalia-physalis-Pacific cnidaria_co1 product 0 length 688 kmer count stats mean 53.14 median 54 min 37 max 65
TCATAAAGATATAGGAACATTATACCTAGTCTTTGGTTTATTTTCAGGTATGGTAGGAACTGCTCTTAGTATGTTAATCAGGTTAGAGTTATCAGGACCCGGTACTATGTTTGGAGATGATCACCTTTATAATGTTATAGTAACTGCTCATGCCTTTGTCATGATCTTTTTCCTTGTAATGCCAGT
CCTAATCGGAGGCTTCGGTAACTGGTTCGTACCCCTATTTATAGGTGCTCCGGATATGGCCTTCCCTAGGTTGAACAACCTAAGTTTTTGATTATTACCCCCTGCTTTATTACTACTATTAGGGTCATCCTTGATAGAACAAGGTGCAGGAACTGGTTGAACTGTTTACCCTCCTTTGTCTGGCCC
CCAAACTCATTCTGGGGGATCAGTTGATATGGCTATTTTCAGTTTACACTGTGCGGGTGCCTCCTCAATTATGGGTGCTATTAACTTCATAACCACTATATTTAATATGAGGGCCCCTGGTATGACTATGGATAAGTTACCATTATTTGTCTGATCAGTTTTGATAACTGCCTTCCTCTTATTACT
ATCATTACCCGTGTTGGCCGGAGCTATAACTATGTTACTTACTGATAGAAATTTTAATACTACTTTCTTCGACCCTGCGGGAGGTGGTGATCCAGTTCTCTATCAACATTTGTTCTGGTTTTTTGGTCAC
>Physalia-physalis-Pacific cnidaria_co1 product 0 length 1915 kmer count stats mean 17.18 median 18 min 3 max 70
TCAGAAGAATATTGGATATATTATCAATCAGAGTTGCTCTTTGGTTTATATAAAGAGGGCATGCAAATAAGAAGTGTGTTGTATCTTCAATACCATCGTTGCAAATGCAAACAGGAGAGGGGGTATCAATAAAACCATGACGATTTTTATGACATCTTAATGGACTTAAACCAACTCTCAAAAAAA
GAGATTACGAACTCCAATGGGATTAAAAATATTAAAAATACTTTTTCTCTCTGGACGAAAGAGTGATAACAAATGCTTTTTTAGGATTTTAATAGGTGGCATGTTTTTAAAATCAGAAATGGTAATATTCCATAATTTGACAGCATTTGGGAAAAAGCTATTCTTGTACCTATCAGAATTACATCG
CGTTTCTTGGAAGGTATTGTCATTGTTATATCTATATAGTGCATGGCGAGGAGGGGGAAGTTTATTATTGAGATAAGCTGGCGTCTTATTACTAACAATTTTATGAAATTGCATAATTCTTCTAGAGAGACGGCGATCAGTTAAGGACTCCCAACCTAGTTCTTCGTATAATTTGGAACGACTTGA
GCCACGCCATGTACCAGTAACAGCAAGCGCTGCTAAATATTGGACCTGTTCAAGCTTTTCCATTAAGTAATTAAGTACTTCACCAAGTTGGTCTTGTTTTGACGGGATATGATAAATTACATCGCAGTAATCAAAATGAGGTCGAACAACTGATTTATAAATTAGATTAAGAGTCTTAAGGGGTAA
AAATTTTGATAGATGTTTAATTATACCAAGATGTTGATTAGCCTTTTTGATTTTTTCACTAAGATGTTTATTAAATGATAAGTTTGATTCAAGAATTAGACCTAAATGTTTATGTCCCTTGACTTTGGTGACAGCAGTTCCATTGAAAAATATTTTTGGATGAATAACAGTGGTCTTTTTACAAGA
AAATATAATTTCAGTGGCTTGTTTAAGAGGATCAGGATTGAATTCAAGTTTCCATTGGTTAGCCCATTTATTAATGGTATCTAAATCATGATTTAGATACCATTAATAAATGGGCTAACCAATGGAAACTCGAATTCAACCCTGATCCTCACAAACAAGCCACTGAAGTCTTATTTTCATGTAAAC
AAAAAAGTCCCAATCATCCTCAACTCTTTTTCAATGGAACTGTTGTGGCTAAAGTGGATGAGCATAAACATTTAGGTCTTGTTCTTGAATCAGATTTGTCTTTTAAGAAACATATTCATGAAAAAATTAAGAAGGCTAAAAAGAACATTGGTATAATAAAGTACCTTTCTAAATTTTTACCCCTTA
AGACTCTTGATCTAATTTATAAATCAGTTGTTCGACCTCATTTTGATTACTGCGATGTAATTTATCATATCCCGTCAAAACAAGACCAACTTGGTGAAGTACTTAATTACTTAATGGAAAAGCTTGAACAGGTCCAATATTTAGCAGCGCTTGCTGTTACTGGTGCATGGCAAGGTTCTAGCTGTT
CAAAATTATACGAAGAATTGGGCTGGGAATCCCTTTCAGATCGTCGTTGGTGTAGGCGAATTCTTCAAATTCATAAGATTACGAATAAAAATACACCTAATTATCTCTACAACAAGCTTCCATGTCGCCGTAGGCCTTTATACAGACTAACCAACTATAATATATTTCACGAAATACGATGCCAGA
CTGATCGGTACAAAAATAGTTTTTTCCCAGATGCAATTAAGGGTTGGAATATTGTTATTCAAATTTTTCCTAATATCCCATCAATAAATGTTCTTAAAAATCATATTTTATCTCTCACTCGTCCAGAGAAAAAAACCTTTTTCAACATACACGACCCTGTGGGACTTCGCTACCTTTTCTTTTTGA
GATTGGGCTTTAGTCCTCTAAGGAGTCACAAAAACAGACATGGTTTTTTAGATAC
>Physalia-physalis-Pacific cnidaria_co1 product 1 length 1915 kmer count stats mean 17.18 median 18 min 3 max 70
TCAGAAGAATATTGGATATATTATCAATCAGAGTTGCTCTTTGGTTTATATAAAGAGGGCATGCAAATAAGAAGTGTGTTGTATCTTCAATACCATCGTTGCAAATGCAAACAGGAGAGGGGGTATCAATAAAACCATGACGATTTTTATGACATCTTAATGGACTTAAACCAACTCTCAAAAAAA
GAGATTACGAACTCCAATGGGATTAAAAATATTAAAAATACTTTTTCTCTCTGGACGAAAGAGTGATAACAAATGCTTTTTTAGGATTTTAATAGGTGGCATGTTTTTAAAATCAGAAATGGTAATATTCCATAATTTGACAGCATTTGGGAAAAAGCTATTCTTGTACCTATCAGAATTACATCG
CGTTTCTTGGAAGGTATTGTCATTGTTATATCTATATAGTGCATGGCGAGGAGGGGGAAGTTTATTATTGAGATAAGCTGGCGTCTTATTACTAACAATTTTATGAAATTGCATAATTCTTCTAGAGAGACGGCGATCAGTTAAGGACTCCCAACCTAGTTCTTCGTATAATTTGGAACGACTTGA
GCCACGCCATGTACCAGTAACAGCAAGCGCTGCTAAATATTGGACCTGTTCAAGCTTTTCCATTAAGTAATTAAGTACTTCACCAAGTTGGTCTTGTTTTGACGGGATATGATAAATTACATCGCAGTAATCAAAATGAGGTCGAACAACTGATTTATAAATTAGATTAAGAGTCTTAAGGGGTAA
AAATTTTGATAGATGTTTAATTATACCAAGATGTTGATTAGCCTTTTTGATTTTTTCACTAAGATGTTTATTAAATGATAAGTTTGATTCAAGAATTAGACCTAAATGTTTATGTCCCTTGACTTTGGTGACAGCAGTTCCATTGAAAAATATTTTTGGATGAATAACAGTGGTCTTTTTACAAGA
AAATATAATTTCAGTGGCTTGTTTAAGAGGATCAGGATTGAATTCAAGTTTCCATTGGTTAGCCCATTTATTAATGGTATCTAAATCATGATTTAGATACCATTAATAAATGGGCTAACCAATGGAAACTCGAATTCAACCCTGATCCTCACAAACAAGCCACTGAAGTCTTATTTTCATGTAAAC
AAAAAAGTCCCAATCATCCTCAACTCTTTTTCAATGGAACTGTTGTGGCTAAAGTGGATGAGCATAAACATTTAGGTCTTGTTCTTGAATCAGATTTGTCTTTTAAGAAACATATTCATGAAAAAATTAAGAAGGCTAAAAAGAACATTGGTATAATAAAGTACCTTTCTAAATTTTTACCCCTTA
AGACTCTTGATCTAATTTATAAATCAGTTGTTCGACCTCATTTTGATTACTGCGATGTAATTTATCATATCCCGTCAAAACAAGACCAACTTGGTGAAGTACTTAATTACTTAATGGAAAAGCTTGAACAGGTCCAATATTTAGCAGCGCTTGCTGTTACTGGTGCATGGCAAGGTTCTAGCTGTT
CAAAATTATACGAAGAATTGGGCTGGGAATCCCTTTCAGATCGTCGTTGGTGTAGGCGAATTCTTCAAATTCATAAGATTACGAATAAAAATACACCTAATTATCTCTACAACAAGCTTCCATGTCGCCGTAGGCCTTTATACAGACTAACCAACTATAATATATTTCACGAAATACGATGCCAGA
CTGATCGGTACAAAAATAGTTTTTTCCCAGATGCAATTAAGGGTTGGAATATTGTTATTCAAATTTTTCCTAATATCCCATCAATAAATGTTCTTAAAAATCATATTTTATCTCTCACTCGTCCAGAGAAAAAAACCTTTTTCAACATACACGACCCTGTGGGACTCCGCTACCTTTTCTTTTTGA
GATTGGGCTTTAGTCCTCTAAGGAGTCACAAAAACAGACATGGTTTTTTAGATAC

with: zcat /gpfs/ycga/work/dunn/sequences/illumina/Novaseq_siphgenomes/Sample_NA22_176_017/*.fastq.gz | sharkmer -k 21 -t 1 --max-reads 2000000 -o output -s Physalia-physalis-Pacific --verbosity 0 --pcr cnidaria > logs/sharkmer.Physalia-physalis-Pacific.log 2>&1

caseywdunn commented 4 months ago

Same for this one -

>Nanomia-bijuga-Atlantic cnidaria_co1 product 0 length 688 kmer count stats mean 52.71 median 53 min 40 max 70
CCATAAAGATATAGGAACCTTATATTTAATCTTTGGATTATTTTCTGCAATGGTGGGAACTGCTTTAAGTATGATAATTAGACTTGAGCTTGCAGGACCCGGAACAATGTTAGGAGATGATCACATTTACAATGTGGTGGTAACAGCTCATGCCTTTGTAATGATTTTTTTCTTGGTTATGCCCGTATTAATAGGAGGATTTGGTAATTGATTTGTACCTTTATTTATAGGTGCTCCTGATATGGCATTTCCAAGGTTAAATAATTTAAGCTTTTGGTTATTACCCCCAGCATTGATACTTTTACTTGGGTCTTCGTTAGTAGAGCAAGGAGCTGGTACAGGATGAACAGTTTATCCCCCCTTATCAGGCCCTCAAACCCATTCTGGTGGATCAGTAGATCTGGCCATTTTTAGTTTACACACAGCAGGAGCTTCCTCTATTATGGGTGCTATAAACTTTATAACAACTATATTTAATATGAGAGCTCCTGGTATGTCTTTTGATAAATTACCATTGTTTGTATGGTCGGTATTAATTACCGCTTTCTTATTATTACTTTCCCTACCTGTATTGGCAGGAGCCATAACCATGCTTTTAACCGATAGGAATTTTAATACTAGCTTTTTTGACCCTGCAGGTGGTGGTGATCCAGTTTTATATCAACACTTATTTTGGTTCTTTGGACAC
>Nanomia-bijuga-Atlantic cnidaria_co1 product 0 length 86 kmer count stats mean 46.21 median 49 min 32 max 57
TCCTAAAAATATAGGTTACATAAATAGGATAGAAAATCATTAATTTTTTACTTTTTGGTTTTCTGTAATAAAGGTTTTTTGGTCAG
>Nanomia-bijuga-Atlantic cnidaria_co1 product 0 length 341 kmer count stats mean 101.96 median 109 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGTAATGCATTTTCCATACATTACATACGAAACGCATGATGATTGACTCACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAATCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 1 length 341 kmer count stats mean 101.52 median 107 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGTAATGCATTTTCCATACATTACATACGAAACGCATGATGATTGACTCACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGGTAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 2 length 341 kmer count stats mean 103.79 median 109 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGTAATGCATTTTCCATACATTACATACGAAACGCATGATGATTGACTCACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 3 length 341 kmer count stats mean 107.41 median 112 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGTAATGCATTTTCCATACATTACATACGAAACGCATGATAATTGACTTACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAATCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 4 length 341 kmer count stats mean 106.98 median 110 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGTAATGCATTTTCCATACATTACATACGAAACGCATGATAATTGACTTACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGGTAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 5 length 341 kmer count stats mean 109.25 median 112 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGTAATGCATTTTCCATACATTACATACGAAACGCATGATAATTGACTTACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 6 length 341 kmer count stats mean 101.00 median 109 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGCAATGCATTTTCCATACATTACATACGAAACGCATGATGATTGACTCACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAATCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 7 length 341 kmer count stats mean 100.57 median 107 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGCAATGCATTTTCCATACATTACATACGAAACGCATGATGATTGACTCACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGGTAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 8 length 341 kmer count stats mean 102.84 median 109 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGCAATGCATTTTCCATACATTACATACGAAACGCATGATGATTGACTCACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 9 length 341 kmer count stats mean 106.46 median 112 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGCAATGCATTTTCCATACATTACATACGAAACGCATGATAATTGACTTACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAATCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 10 length 341 kmer count stats mean 106.02 median 110 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGCAATGCATTTTCCATACATTACATACGAAACGCATGATAATTGACTTACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGGTAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 11 length 341 kmer count stats mean 108.30 median 112 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGCAATGCATTTTCCATACATTACATACGAAACGCATGATAATTGACTTACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT

called with zcat /gpfs/ycga/work/dunn/sequences/illumina/Novaseq_siphgenomes/Sample_CWD16_296_281/*.fastq.gz | sharkmer -k 21 -t 1 --max-reads 2000000 -o output -s Nanomia-bijuga-Atlantic --verbosity 0 --pcr cnidaria > logs/sharkmer.Nanomia-bijuga-Atlantic.log 2>&1

Though in this case the first sequence is correct, maybe others are cruff.

caseywdunn commented 4 months ago

In addition, should cap the number of sequences returned since when there a large number they are likely not actual sequences but unphased combinations of variation. Should also revisit sorting criteria

caseywdunn commented 4 months ago

Each time through this loop - https://github.com/caseywdunn/sharkmer/blob/f486fd3248187ee4a2ac244c811234ed0ea72018/sharkmer/src/pcr/mod.rs#L1366

i is reinitialized https://github.com/caseywdunn/sharkmer/blob/f486fd3248187ee4a2ac244c811234ed0ea72018/sharkmer/src/pcr/mod.rs#L1537

Then it is ordered by min_count https://github.com/caseywdunn/sharkmer/blob/f486fd3248187ee4a2ac244c811234ed0ea72018/sharkmer/src/pcr/mod.rs#L1621

So need to have an amplicon_index that is not reset for different primers

caseywdunn commented 4 months ago

In the Nanomia sequences above, only the top one has a blast hit. It has highest min coverage, but lower median mean max. So min coverage seems like a reasonable strategy here.

Also add a min_length option to struct PCRParams that defaults to 0.