ccbaumler / 2023-dietary-plants

Bioinformatically barcode the diet from stool metagenome samples
GNU General Public License v3.0
2 stars 1 forks source link

Extract the individual kmers from the results #9

Open ccbaumler opened 1 year ago

ccbaumler commented 1 year ago

The final results creates a csv of all kmers. We are thinking that if we can create individual signatures for the kmers then we can use sourmash compare and sourmash kmer + blast for further analysis.

Here is a code chunk provided by Mo for this purpose:

kSize = 51
scaled = 10_000
important_hashes = [140317412050,141813711179]
for kmer_hash in important_hashes:
    new_sig_outfile = os.path.join("working_directory", str(kmer_hash) + ".sig")
    final_mh = sourmash.MinHash(n=0, ksize=kSize, scaled=scaled)
    final_mh.add_hash(kmer_hash)
    finalSig = sourmash.SourmashSignature(final_mh, name=str(kmer_hash), filename=new_sig_outfile)
    print(final_mh.hashes)
    print(f"Saving to {new_sig_outfile}...")
    with sourmash.sourmash_args.FileOutput(new_sig_outfile, 'wt') as fp:
        sourmash.save_signatures([finalSig], fp=fp)