cistrome / MIRA

Python package for analysis of multiomic single cell RNA-seq and ATAC-seq.
52 stars 7 forks source link

questions for batch effect #23

Open rdf1993 opened 1 year ago

rdf1993 commented 1 year ago

I have collected samples from various developmental stages and need to analyze them together. However, I am concerned about the batch effect that cells from different samples clustering while using MIRA. Are there effective tools available to correct for batch effects in this situation?

AllenWLynch commented 1 year ago

Hi there,

Currently, the best approach with MIRA may be to just model the batches together - dependent on the strength of the batch effects the topics will still capture biological signals but any UMAP will look quite disjointed.

Alternatively, within the next week or two we will be releasing code for the batch effect method we've developed for finding topics in batched single-cell data which is currently under review. This method works quite well!

AL

rdf1993 commented 1 year ago

I am coming to asking that is the batch effect correction method already?Hoping to see it.

AllenWLynch commented 1 year ago

Hi,

The journal is prepping our manuscript for publication, so I won't alter the main branch here until that happens, but the code for the updated MIRA package is available at this repo:

https://github.com/AllenWLynch/MIRA/tree/more-automation

Make sure you're looking at the "more-automation" branch. I can help with install if there are questions there.

The documentation for this version can be found here: https://mira-multiome.readthedocs.io/en/more-automation. When we get a publication date, these changes will move to the main repository here.

If I remember right you have seen a preliminary version of the manuscript? Let me know what further questions you may have.

AL

rdf1993 commented 1 year ago

thank you for your response. I would be happy to try the new version. I encountered a problem. Could it be because my package installation is incomplete?

tuner = mira.topics.SpeedyTuner( model = model, n_jobs=5, save_name = 'tutorial/0', min_topics = 3, max_topics = 20, )

OperationalError Traceback (most recent call last) File ~/anaconda3/envs/scvi-env/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1968, in Connection._exec_single_context(self, dialect, context, statement, parameters) 1967 if not evt_handled: -> 1968 self.dialect.do_execute( 1969 cursor, str_statement, effective_parameters, context 1970 ) 1972 if self._has_events or self.engine._has_events:

File ~/anaconda3/envs/scvi-env/lib/python3.9/site-packages/sqlalchemy/engine/default.py:920, in DefaultDialect.do_execute(self, cursor, statement, parameters, context) 919 def do_execute(self, cursor, statement, parameters, context=None): --> 920 cursor.execute(statement, parameters)

OperationalError: database is locked

The above exception was the direct cause of the following exception:

OperationalError Traceback (most recent call last) Cell In[20], line 1 ----> 1 tuner = mira.topics.SpeedyTuner( 2 model = model, 3 n_jobs=5, 4 save_name = 'MIRA2_RNA', 5 min_topics = 3, 6 max_topics = 20, ... --> 920 cursor.execute(statement, parameters)

OperationalError: (sqlite3.OperationalError) database is locked [SQL: INSERT INTO alembic_version (version_num) VALUES ('v2.6.0.a') RETURNING version_num] (Background on this error at: https://sqlalche.me/e/20/e3q8) Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

AllenWLynch commented 1 year ago

Hi,

This is an issue with too many concurrent read/writes to the SQLite Database. I have never had a problem with 5 before, but sometimes one can get unlucky with the race conditions.

if you try reducing the n_jobs parameter to 3 or 4, does this error go away? Unfortunately, you may need to start a new tuning series as this one may be corrupted. In the release version, I may have to further decrease the number of cores allowed when using SQLite instead of REDIS.

AL

katkoad commented 1 year ago

Thanks for working on implementing batch correction, that is very helpful. Can you help with installing the version with batch correction?

AllenWLynch commented 1 year ago

Hi,

The version with batch correction will supplant the publicly-available version any day now as we are working on changes for the RP modeling API. To download the branch of MIRA which has the new method, use:

pip install @.***

In a conda or venv environment - the new version is not on conda yet.

The tutorial website for this branch can be found at:

https://mira-multiome.readthedocs.io/en/more-automation/index.html.

AL


From: katkoad @.> Sent: Monday, July 10, 2023 12:33 PM To: cistrome/MIRA @.> Cc: AllenWLynch @.>; Comment @.> Subject: Re: [cistrome/MIRA] questions for batch effect (Issue #23)

Thanks for working on implementing batch correction, that is very helpful. Can you help with installing the version with batch correction?

— Reply to this email directly, view it on GitHubhttps://github.com/cistrome/MIRA/issues/23#issuecomment-1629412248, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE43JPHEG6ZANVSORL5U7UTXPQ4H3ANCNFSM6AAAAAAXKKNNPI. You are receiving this because you commented.Message ID: @.***>