deeplearningplus / tGPT

Generative Pretraining from Transcriptomes
GNU General Public License v3.0
9 stars 2 forks source link

Human data #2

Open apal6 opened 1 year ago

apal6 commented 1 year ago

Hi,

I am new to this but was able to successfully replicate your google colab notebook.

I wonder how could I incorporate human gene symbol instead of mouse. text_file = "./data/Muris_gene_rankings.txt.gz" ## Gene symbols ranked by expression

Thank you & regards, Aastha

HelloWorldLTY commented 10 months ago

Hi, I have a solution for your question. Since their input data is actually rank of gene expression per cell, we only need to transfer the data from anndata version (for sinlge-cell) into this form. I write a function here to realize this requirement:

def get_gene_token(adata):
    lines = []
    for i in adata.obs_names:
        adata_t = adata[i,:]
        reverse_index = np.argsort(adata_t.X.toarray()[0])[::-1]
        reverse_index = reverse_index[0:256]
        gene_list = adata_t.var_names.values[reverse_index]
        raw_gene = ''
        for index, gene in enumerate(gene_list):
            raw_gene += gene
            if index != len(gene_list)-1:
                raw_gene += ' '
        lines.append(raw_gene)
    return lines

And you can replace lines with the output of this function. Please correct me if my understanding is wrong.