biomap-research / scFoundation

Apache License 2.0
185 stars 27 forks source link

scFoundation on GEARS with pretrained model used #37

Open YOU-k opened 1 week ago

YOU-k commented 1 week ago

Hi there, thanks for the nice work! I am trying to follow your code on perturbation prediction task. Based on your provided code pre_in = x.clone().reshape(num_graphs, self.num_genes+1) x = x.reshape(num_graphs, self.num_genes+1)[:,:-1] the last column in pre-in should be total counts. x is removed of total counts, which means that only expression values are retained in x.

However, when I look for the pretrained model to be used, the bin type in the provided one in the github is 'auto_bin'. does that mean total counts is not used in the input to get the pretrained model? But if I would like to use it to get embedding for GEARS, what should I do with the total counts?

Also, it seems that the pre_in is directly used as input for the pretrained model. does this mean that the input data is already reformatted to have 19264 genes?

YOU-k commented 1 week ago

also, the 'pad_token_id': 103, 'mask_token_id': 102 are stored in the config file, while there are genes that have the same token id with them based on the csv file.