Adding PPI information - Githubissues

MariaRosariaNucera commented 3 months ago

Hello, thank you for this tool! It would be very useful for me as I have scRNA-seq, scATAC-seq and proteomic data.

I would like to incorporate the PPI data into the GRN. I saw this information in the readme "For now, such personalisation requires to use directly some hummuspy (python package) functions at the end of the pipeline and write some configuration files manually. It will be simplified soon !", however if you could please give me some more information about how to write the config files in that case it would be great!

Thank you so much Best regards, Maria

r-trimbour commented 3 months ago

Dear Maria,

I'm happy you're thinking to apply HuMMuS to your data !☺️

Your question is very interesting, I thus wrote an extensive answer so you can choose the integration fitting the best your use-case. Here is a little figure with 3 different scenarios that might correspond to the integration you're looking for. (spoiler: it's probably C, which is also the one requiring some python cooking😅)

If you're looking for the panel C case, I will try to implement an easy solution in the coming days ! It's probably better if you just prepare the PPI network and bipartite files, and I let you know as soon as it's available. 👼

integrate_PPI_info

A) PPI are used for TF-TF interactions, this is the structure we used.

It is then very easy, you can do everything in R, just using your network instead of the default ones. : TF_PPI_network <- read_table(TF_PPI_path, header=FALSE, sep=sep) add_network(hummus_object, TF_PPI_network, multiplex_name = "TF")

B) PPI can be used as a way to refine gene-gene relationships in the "RNA" layer.

It can be useful when considering that genes interacting physically have higher chance to be regulated by the same TF and to be involved in the same functions. It is also very easy to do, but you then group protein and transcriptomic layers into the same multiplex. It means you can't study the details of the transcripts to protein links. The solution is simply to add it inside the already created multiplex, with a different network_name : TF_PPI_network <- read_table(TF_PPI_path, header=FALSE, sep=sep) add_network(hummus_object, TF_PPI_network, multiplex_name = "RNA", network_name = "PPI")

C) PPI can be used as a 4th layer.

It's a similar structure to the one we used for the snmC+scATAC+scRNA application in the paper. For this one, which is probably your use-case, you need to finish your analysis through hummuspy for now.

The multilayer itself is just a list of files, bipartites and multiplex. You will thus need two additional files : a bipartite between the scRNA network and the PPI network (the red dashed lines in panel C), and the PPI network. Right now, nodes of different layers shouldn't have the same names. If nodes in the scRNA and in the PPI networks have the same names, you can add a suffix in the PPI network (e.g. MYC_ppi, or any tag suiting you). Then, the bipartite has to list all connexion between scRNA gene and PPI nodes, it might be trivial in your case. (e.g.: MYC MYC_ppi JAK JAK_ppi ...)

I will try to simplify the following part in the coming days, I'll let you know as soon as I finished it ☺️

However, if you wanna give it a try before, here is the config file generated as multilayer_folder/config/config.yml when running the classic define_target_genes function in R. You need to precise in the config file how to explore the different layers altogether, according to MutliXRank format . It includes:

eta: Which layer to start at (e.g. from TF if you want to find downstream targets)
lamb: Probability table for jumping between and to stay in layers. (e.g. by default, from the TF you want to go to the ATAC layer but not directly to the gene layer ; it is precised in lamb.) It's a square matrix of size being the number of multiplex
bipartite/multiplex: The bipartites/multiplex locations, and to what they correspond.

We can modify it manually, to integrate the new PPI layer in 1) multiplex, 2) lamb and 3) eta. lamb, eta and multiplex list have to be ordered the same way. Lamb[i, j] gives you the probability to jump from j to i, it thus must be normalized (summing to 1) by columns. You can open the R-generated config through hummus.config.open(config) and save the new one hummus.config.open(config). Once you generated the config, you can use hummus.explore_network.compute_multiple_RandomWalk() to explore the multilayer, passing a list of seeds (e.g. TF), and the different folders_path.

Thanks a lot for you question, And I will notify you as soon as this python feature is implemented, if it's your use-case.

Best regards, Rémi

MariaRosariaNucera commented 3 months ago

Thank you for the effort in putting all this information in the reply. Yes, I confirm I am interested in the option C. At the moment I am still generating my PPI network, so it's not a problem to wait a bit, in case I finish before the simplified workflow is ready, I will definitely try to modify the config file as you explained. Thank you very much!

Maria

r-trimbour commented 2 months ago

I updated here a small example on how to extend the config file to add PPI :) https://github.com/cantinilab/HuMMuS/tree/main/hummuspy/examples/Tutorial_example_hummuspy.ipynb

Rémi

cantinilab / HuMMuS

Adding PPI information #3

A) PPI are used for TF-TF interactions, this is the structure we used.

B) PPI can be used as a way to refine gene-gene relationships in the "RNA" layer.

C) PPI can be used as a 4th layer.

I will try to simplify the following part in the coming days, I'll let you know as soon as I finished it ☺️