QData / FastSK

Bioinformatics 2020: FastSK: Fast and Accurate Sequence Classification by making gkm-svm faster and scalable. https://fastsk.readthedocs.io/en/master/
https://fastsk.readthedocs.io/en/master/
Apache License 2.0
21 stars 9 forks source link

Obtain k-mer sequences and weights #30

Open cflorian900 opened 2 years ago

cflorian900 commented 2 years ago

Is there a way to retrieve a list of the most contributing features (i.e., k-mer sequences) and their associated weights? I have attempted to pull them by using the sklearn.feature_extraction module to no avail. However, I did notice (with my incredibly limited familiarity with c++) that there appears to be the tokenized feature array objects being passed to fastsk_kernel.cpp. Is this a potential source of extraction? Any help would be greatly appreciated!

hoangmgh commented 1 year ago

I am interested in this as well! Having the weights will be useful to reproduce the result of, for example, deltaSVM.