dib-lab / kProcessor

kProcessor: kmers processing framework.
https://kprocessor.readthedocs.io
BSD 3-Clause "New" or "Revised" License
11 stars 1 forks source link

Pandas Interface for KDataframes #69

Open shokrof opened 4 years ago

shokrof commented 4 years ago

I have created new class for columns you will find a file called defaultColumn

and I added new functions in kDataframe base class

void addColumn(string columnName, Column); template<typename T,typename Container> T getKmerColumnValue(string columnName,string kmer); template<typename T,typename Container> void setKmerColumnValue(string columnName,string kmer, T value); void changeDefaultColumnType(Column); template<typename T,typename Container> T getKmerDefaultColumnValue(string kmer); template<typename T,typename Container> void setKmerDefaultColumnValue(string kmer, T value);

mr-eyes commented 3 years ago

Just a start for the concept

import kProcessor as kp
import numpy as np
import pandas as pd

genesFrame = kp.kDataFramePHMAP(21)
genes_file = "human_pc.part_002.fa"
chunkSize = 10000
cKF = kp.index(genesFrame, {"kSize": genesFrame.ksize()}, genes_file, chunkSize, f"{genes_file}.names")

# Create a zeros numpy 2d array with type unsigned long long with the size of the kDataFrame
np_array = np.zeros(shape=(2,genesFrame.size()), dtype=np.ulonglong)

# Load the kDataFrame into the numpy array
genesFrame.to_numpy(np_array)

df = pd.DataFrame(data=np_array.T)
df.describe()