dnbaker / dashing2

Dashing 2 is a fast toolkit for k-mer and minimizer encoding, sketching, comparison, and indexing.
MIT License
62 stars 7 forks source link

Multi-fasta functionality #79

Closed matnguyen closed 1 year ago

matnguyen commented 1 year ago

Is it in the plans to add multi-fasta functionality, instead of having to have separate fasta files for each sequence?

dnbaker commented 1 year ago

You should already be good! All you have to do is put multiple filenames on the same line for the -F and -Q options. You can also do it for paired-end reads, for instance.

Would you let me know how that goes?

Thanks,

Daniel

On Thursday, July 13, 2023, Matthew Nguyen @.***> wrote:

Is it in the plans to add multi-fasta functionality, instead of having to have separate fasta files for each sequence?

— Reply to this email directly, view it on GitHub https://github.com/dnbaker/dashing2/issues/79, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ5UVNPZDYAQZ5RT3YXFTTXQBX7LANCNFSM6AAAAAA2JR3BMU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

matnguyen commented 1 year ago

Hi Daniel, thanks for your reply.

What I mean by a multi-fasta is a single fasta file that has multiple sequences, a format like this:

>sequenceID-001 description
AAGTAGGAATAATATCTTATCATTATAGATAAAAACCTTCTGAATTTGCTTAGTGTGTAT
ACGACTAGACATATATCAGCTCGCCGATTATTTGGATTATTCCCTG
>sequenceID-002 description
CAGTAAAGAGTGGATGTAAGAACCGTCCGATCTACCAGATGTGATAGAGGTTGCCAGTAC
AAAAATTGCATAATAATTGATTAATCCTTTAATATTGTTTAGAATATATCCGTCAGATAA
TCCTAAAAATAACGATATGATGGCGGAAATCGTC
>sequenceID-003 description
CTTCAATTACCCTGCTGACGCGAGATACCTTATGCATCGAAGGTAAAGCGATGAATTTAT
CCAAGGTTTTAATTTG
dnbaker commented 1 year ago

Oh great! Just use --parse-by-seq to treat each entry as its own entity.

On Thursday, July 13, 2023, Matthew Nguyen @.***> wrote:

Hi Daniel, thanks for your reply.

What I mean by a multi-fasta is a single fasta file that has multiple sequences, a format like this:

sequenceID-001 description AAGTAGGAATAATATCTTATCATTATAGATAAAAACCTTCTGAATTTGCTTAGTGTGTAT ACGACTAGACATATATCAGCTCGCCGATTATTTGGATTATTCCCTG

sequenceID-002 description CAGTAAAGAGTGGATGTAAGAACCGTCCGATCTACCAGATGTGATAGAGGTTGCCAGTAC AAAAATTGCATAATAATTGATTAATCCTTTAATATTGTTTAGAATATATCCGTCAGATAA TCCTAAAAATAACGATATGATGGCGGAAATCGTC

sequenceID-003 description CTTCAATTACCCTGCTGACGCGAGATACCTTATGCATCGAAGGTAAAGCGATGAATTTAT CCAAGGTTTTAATTTG

— Reply to this email directly, view it on GitHub https://github.com/dnbaker/dashing2/issues/79#issuecomment-1635097785, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ5UVKDADAGFA43EAK6MJ3XQCJDXANCNFSM6AAAAAA2JR3BMU . You are receiving this because you commented.Message ID: @.***>