Open caseywdunn opened 5 months ago
Same for this one -
>Nanomia-bijuga-Atlantic cnidaria_co1 product 0 length 688 kmer count stats mean 52.71 median 53 min 40 max 70
CCATAAAGATATAGGAACCTTATATTTAATCTTTGGATTATTTTCTGCAATGGTGGGAACTGCTTTAAGTATGATAATTAGACTTGAGCTTGCAGGACCCGGAACAATGTTAGGAGATGATCACATTTACAATGTGGTGGTAACAGCTCATGCCTTTGTAATGATTTTTTTCTTGGTTATGCCCGTATTAATAGGAGGATTTGGTAATTGATTTGTACCTTTATTTATAGGTGCTCCTGATATGGCATTTCCAAGGTTAAATAATTTAAGCTTTTGGTTATTACCCCCAGCATTGATACTTTTACTTGGGTCTTCGTTAGTAGAGCAAGGAGCTGGTACAGGATGAACAGTTTATCCCCCCTTATCAGGCCCTCAAACCCATTCTGGTGGATCAGTAGATCTGGCCATTTTTAGTTTACACACAGCAGGAGCTTCCTCTATTATGGGTGCTATAAACTTTATAACAACTATATTTAATATGAGAGCTCCTGGTATGTCTTTTGATAAATTACCATTGTTTGTATGGTCGGTATTAATTACCGCTTTCTTATTATTACTTTCCCTACCTGTATTGGCAGGAGCCATAACCATGCTTTTAACCGATAGGAATTTTAATACTAGCTTTTTTGACCCTGCAGGTGGTGGTGATCCAGTTTTATATCAACACTTATTTTGGTTCTTTGGACAC
>Nanomia-bijuga-Atlantic cnidaria_co1 product 0 length 86 kmer count stats mean 46.21 median 49 min 32 max 57
TCCTAAAAATATAGGTTACATAAATAGGATAGAAAATCATTAATTTTTTACTTTTTGGTTTTCTGTAATAAAGGTTTTTTGGTCAG
>Nanomia-bijuga-Atlantic cnidaria_co1 product 0 length 341 kmer count stats mean 101.96 median 109 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGTAATGCATTTTCCATACATTACATACGAAACGCATGATGATTGACTCACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAATCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 1 length 341 kmer count stats mean 101.52 median 107 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGTAATGCATTTTCCATACATTACATACGAAACGCATGATGATTGACTCACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGGTAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 2 length 341 kmer count stats mean 103.79 median 109 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGTAATGCATTTTCCATACATTACATACGAAACGCATGATGATTGACTCACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 3 length 341 kmer count stats mean 107.41 median 112 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGTAATGCATTTTCCATACATTACATACGAAACGCATGATAATTGACTTACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAATCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 4 length 341 kmer count stats mean 106.98 median 110 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGTAATGCATTTTCCATACATTACATACGAAACGCATGATAATTGACTTACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGGTAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 5 length 341 kmer count stats mean 109.25 median 112 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGTAATGCATTTTCCATACATTACATACGAAACGCATGATAATTGACTTACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 6 length 341 kmer count stats mean 101.00 median 109 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGCAATGCATTTTCCATACATTACATACGAAACGCATGATGATTGACTCACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAATCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 7 length 341 kmer count stats mean 100.57 median 107 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGCAATGCATTTTCCATACATTACATACGAAACGCATGATGATTGACTCACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGGTAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 8 length 341 kmer count stats mean 102.84 median 109 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGCAATGCATTTTCCATACATTACATACGAAACGCATGATGATTGACTCACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 9 length 341 kmer count stats mean 106.46 median 112 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGCAATGCATTTTCCATACATTACATACGAAACGCATGATAATTGACTTACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAATCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 10 length 341 kmer count stats mean 106.02 median 110 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGCAATGCATTTTCCATACATTACATACGAAACGCATGATAATTGACTTACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGGTAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
>Nanomia-bijuga-Atlantic cnidaria_co1 product 11 length 341 kmer count stats mean 108.30 median 112 min 23 max 204
TCTTAAGGATGTAGGATAGTTAAGATATTTTTATGAGTAAAAAGGCTAAGCAATGCATTTTCCATACATTACATACGAAACGCATGATAATTGACTTACTTAATTACTTTTAAACAAAAAGTATATTGAATATCAAACAATTCGTCAAATACTTGTGAGATCTGTACTCTTTATTAGTTTAACGGGGGACATGACAAAAGCAATAACTGTTATCATGATAGATTGAACCATAGGATTTCGCCATTTTTTTCTCCAATTGTTTGTTTATCATATTAGTGTGTACAAAGAAAAGTCGAAGTTGATAGCGCCTTATACTAGTTCCAATTTGATTTTTTTGTCAT
called with zcat /gpfs/ycga/work/dunn/sequences/illumina/Novaseq_siphgenomes/Sample_CWD16_296_281/*.fastq.gz | sharkmer -k 21 -t 1 --max-reads 2000000 -o output -s Nanomia-bijuga-Atlantic --verbosity 0 --pcr cnidaria > logs/sharkmer.Nanomia-bijuga-Atlantic.log 2>&1
Though in this case the first sequence is correct, maybe others are cruff.
In addition, should cap the number of sequences returned since when there a large number they are likely not actual sequences but unphased combinations of variation. Should also revisit sorting criteria
Each time through this loop - https://github.com/caseywdunn/sharkmer/blob/f486fd3248187ee4a2ac244c811234ed0ea72018/sharkmer/src/pcr/mod.rs#L1366
i
is reinitialized https://github.com/caseywdunn/sharkmer/blob/f486fd3248187ee4a2ac244c811234ed0ea72018/sharkmer/src/pcr/mod.rs#L1537
Then it is ordered by min_count https://github.com/caseywdunn/sharkmer/blob/f486fd3248187ee4a2ac244c811234ed0ea72018/sharkmer/src/pcr/mod.rs#L1621
So need to have an amplicon_index
that is not reset for different primers
In the Nanomia sequences above, only the top one has a blast hit. It has highest min coverage, but lower median mean max. So min coverage seems like a reasonable strategy here.
Also add a min_length
option to struct PCRParams
that defaults to 0.
Get the following:
with:
zcat /gpfs/ycga/work/dunn/sequences/illumina/Novaseq_siphgenomes/Sample_NA22_176_017/*.fastq.gz | sharkmer -k 21 -t 1 --max-reads 2000000 -o output -s Physalia-physalis-Pacific --verbosity 0 --pcr cnidaria > logs/sharkmer.Physalia-physalis-Pacific.log 2>&1