bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
87 stars 17 forks source link

Retreive source sequences from a database #264

Closed phisanti closed 1 year ago

phisanti commented 1 year ago

Versions poppunk 2.3.0 pp-sketchlib v2.1.1

Command used and output returned

from PopPUNK import sketchlib 
sketchlib.getSeqsInDb('path/to/mydb.h5')

['11657_5#1',
 '11657_5#13',
 '11657_5#16',
 '11657_5#17',
 '11657_5#2',
 '11657_5#21',
 '11657_5#25',
 '11657_5#31',
 '11657_5#32',
 '11657_5#33',
 '11657_5#35',
 '11657_5#4',
 '11657_5#42',
 '11657_5#60',
 '11657_5#66',
 '11657_5#68',
 '11657_5#70',
 '11657_5#76',
 '11657_5#77',
 '11657_5#79',
 '11657_5#85',
 '11657_5#9',
 '11657_5#90',
 '11657_5#93',
 '11657_6#12',
...
 'esc_la5773aa_as',
 'esc_la5776aa_as',
 'esc_la5777aa_as',
 'esc_la5784aa_as',
 ...]

Describe the bug More than a bug, this is a question. I would like to retreive the sequences used to form a database. The following command retreives a list of sequences names, thus, I was wondering if there is a command to retreive the full DNA sequences.

johnlees commented 1 year ago

Databases only store sequence sketches, not the full DNA sequences, so unfortunately this is not possible.