dib-lab / kProcessor

kProcessor: kmers processing framework.
https://kprocessor.readthedocs.io
BSD 3-Clause "New" or "Revised" License
11 stars 1 forks source link

Easy way to retrieve the groupNames associated with kmer's color? #92

Open mr-eyes opened 2 years ago

mr-eyes commented 2 years ago

While I am in the wrapping process, I tried to retrieve the groups associated with a kmer color, but I couldn't find a direct way.

So, I will put what I understood so far, and please correct me if I'm wrong.

After indexing, we will have a kDataFrame with key(hashVal):Val(kmerOrder). Then we can get the color associated with that kmer through the following getKmerColumn function getKmerColumn("color", hashVal)

https://github.com/dib-lab/kProcessor/blob/6fa68570bd226a91406d74aa3185cda4bd049824/include/kProcessor/kDataFrame.hpp#L414

Or by kmer Order like here, https://github.com/dib-lab/kProcessor/blob/6fa68570bd226a91406d74aa3185cda4bd049824/include/kProcessor/kDataFrame.hpp#L421

Now I have the color. How can I get to the color->group_IDs through the kDataFrame in an easy way, if possible?


Here's the corresponding Python code for this.

import kProcessor as kp

kf_map = kp.kDataFramePHMAP(21)

fasta_file = "seq.fa"
names_file = "seq.fa.names"

kp.index(kf_map, {"kSize": 21}, fasta_file, 1, names_file)

print(f"total size: {kf_map.size()}")
print(f"Column names: {kf_map.getColumnNames()}")

hash_to_color = dict()

it = kf_map.begin()
while it != kf_map.end():
    kmer_hash = it.getHashedKmer()
    kmer_color = kf_map.getKmerColumnValue_int("color", it.getHashedKmer())
    hash_to_color[kmer_hash] = kmer_color
    it.next()

print("kmer to colors")
for _hash, color in hash_to_color.items():
    print(f"hash({_hash}) : color({color})")

cc @drtamermansour @shokrof