Open rvinas opened 5 months ago
Hi, for the details of constructing gene co-expression graph, you may need to also read the original GEARS paper. https://www.nature.com/articles/s41587-023-01905-6 pre_in
represents the unperturbed cells. We used the same dataloader from GEARS. What we did was replace the randomly initialized gene embeddings of the original GEARS model with the contextual gene embedding from our model. The edges in the gene co-expression graph remain unchanged.
Thank you for the clarification! I now understand why pre_in
represents the unperturbed cells. In the create_cell_graph_dataset
function, control cells are sampled at random and their expression is then stored in data.x
. Do you have any intuition on why the contextual gene embeddings from scFoundation are helpful for that task, considering that control cells are sampled at random? I wonder why the contextual aspect is important, given that the sampled control cell is unrelated to the perturbed cell.
Happy to know that you figure out the code. As for the contextual embeddings, I think that the contextual gene embeddings offer a more flexible input for the model. This variety of input data may make the model easier to learn the distribution of the input data and predict the results well. Also, the contextual embeddings contain more information about the gene expression level compared with the random initialized one, which is another gain for better prediction.
I see, thank you for your insights. Did you try conditioning the GEARS model on your learnt, non-contextual gene embeddings? (i.e. the gene name embeddings). In other words, can the performance gain be explained by the quality of gene embeddings as opposed to the contextual aspect? I am still unsure why it is helpful to condition the model on random control cells.
Hello, thank you for your work and the code. I am trying to understand how the scFoundation embeddings were used within the GEARS framework. In the paper, you mention:
How was the cell-specific gene co-expression graph constructed exactly? I was examining your code and I believe this happens here. Could you clarify what the variable
pre_in
represents? Am I correct in thinking that the GEARS data loader provides the expression of perturbed single-cells indata.x
? My understanding from your paper is that the scFoundation embeddings are extracted using control cells only.Your help would be greatly appreciated!