cistrome / MIRA

Python package for analysis of multiomic single cell RNA-seq and ATAC-seq.
52 stars 7 forks source link

Using Alternative Preprocess Methods for Training RP Models #38

Open shycheng opened 6 months ago

shycheng commented 6 months ago

Hello AllenWLynch,

Thank you very much for developing this incredible tool. It's been instrumental in my research on gene expression and chromatin accessibility!

I am currently exploring the potential of integrating data from alternative dimensionality reduction and data integration methods (such as GLUE, Seurat, etc.) for training RP models, instead of using pre-trained topic models. My aim is to utilize processed data from these methods to see if they can enhance or provide new insights into the training of RP models.

Could you please advise if it is possible to adapt the RP model training process in MIRA to work with these alternative methods? If so, are there any specific considerations or modifications that I should be aware of?

Additionally, if you could provide any examples or documentation related to this process, it would be greatly appreciated.

Thank you for your time and assistance.

Best regards, Cheng

shahrozeabbas commented 5 months ago

I also have a similar question, any input would be appreciated thanks!

AllenWLynch commented 5 months ago

Hi Cheng,

I do not think it would be trivial to do this using MIRA, since the code is so intertwined with the topic modeling. The key aspect of the topic model for RP model training is the imputation of the posterior likelihood that each region is accessible in each cell. This functions essentially as a denoising step over the scATAC-seq data for RP model learning.

I was working on new modeling code that worked a bit differently and did not rely on this denoising step - but it is still very much under development.

If you could describe your aim with a bit more detail, I can give better advice. You can email me allenlynch@g.harvard.edu if you don't with to discuss on a public forum.

AL

shahrozeabbas commented 5 months ago

@AllenWLynch Hi, would it be alright if I emailed with questions? I have a multiome dataset and have used the scvi toolkit to process my data. Was hoping to feed this into downstream model training, especially for ATAC.