Hi there, thanks for the nice work!
I am trying to follow your code on perturbation prediction task.
Based on your provided code
pre_in = x.clone().reshape(num_graphs, self.num_genes+1) x = x.reshape(num_graphs, self.num_genes+1)[:,:-1]
the last column in pre-in should be total counts. x is removed of total counts, which means that only expression values are retained in x.
However, when I look for the pretrained model to be used, the bin type in the provided one in the github is 'auto_bin'. does that mean total counts is not used in the input to get the pretrained model?
But if I would like to use it to get embedding for GEARS, what should I do with the total counts?
Also, it seems that the pre_in is directly used as input for the pretrained model. does this mean that the input data is already reformatted to have 19264 genes?
also, the 'pad_token_id': 103, 'mask_token_id': 102 are stored in the config file, while there are genes that have the same token id with them based on the csv file.
Hi there, thanks for the nice work! I am trying to follow your code on perturbation prediction task. Based on your provided code
pre_in = x.clone().reshape(num_graphs, self.num_genes+1)
x = x.reshape(num_graphs, self.num_genes+1)[:,:-1]
the last column in pre-in should be total counts. x is removed of total counts, which means that only expression values are retained in x.However, when I look for the pretrained model to be used, the bin type in the provided one in the github is 'auto_bin'. does that mean total counts is not used in the input to get the pretrained model? But if I would like to use it to get embedding for GEARS, what should I do with the total counts?
Also, it seems that the pre_in is directly used as input for the pretrained model. does this mean that the input data is already reformatted to have 19264 genes?