Question for multiple datasets

jjacobi3 commented 1 year ago

Hi, I have multiple 10x multiome datasets that span a time course. I was hoping you could give some advice or the recommend a proper workflow for training the models together?

Thanks!

Justine

AllenWLynch commented 1 year ago

Hi Justine,

Were these datasets collected in one or multiple batches? Will technical differences between samples be a problem?

AL

From: jjacobi3 @.> Sent: Monday, March 6, 2023 11:38 AM To: cistrome/MIRA @.> Cc: Subscribed @.***> Subject: [cistrome/MIRA] Question for multiple datasets (Issue #19)

Hi, I have multiple 10x multiome datasets that span a time course. I was hoping you could give some advice or the recommend a proper workflow for training the models together?

Thanks!

Justine

— Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcistrome%2FMIRA%2Fissues%2F19&data=05%7C01%7C%7Cddd4eb23bad845959af008db1e69a23b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638137211260916895%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2B09Ckp%2BmHGktEo%2BfkgQnB%2FCk9yzugiIj6a1xKgVGxfU%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAE43JPGSLSRR46HD7A3AIMLW2YOKHANCNFSM6AAAAAAVRNSWRY&data=05%7C01%7C%7Cddd4eb23bad845959af008db1e69a23b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638137211260916895%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ONLl72pUHDDD2coLgx61XLBARJkk3kfM%2Bziu1JGMufA%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

jjacobi3 commented 1 year ago

Hi Allen!

Our samples were collected in separate batches, so we'd like to process them individually as well as together to see a trajectory. We have processed the samples already using Seurat/Signac so we know the minimal technical variation across samples, but we'd like to use mira since it would be best for addressing our specific hypotheses.

Thanks! -- Justine

AllenWLynch commented 1 year ago

Ah I see.

To stitch together a trajectory will be challenging depending on the strength of the technical effects. If they are severe, the trajectory will be distorted. If not, applying MIRA as if this were a single dataset would be easiest, and you can simply learn topics, etc, over all of the batches simultaneously. This would be my first attempt.

Also, we have developed a batch correction method for MIRA that performs quite well and is currently under review. That would be the most ideal model for your data, but unfortunately I am a couple of weeks away from publishing code. If the strategy above does not work because of technical effects, I can try to get the batch correcting model ready to go for you to try.

AL

From: jjacobi3 @.> Sent: Sunday, March 12, 2023 3:43 PM To: cistrome/MIRA @.> Cc: AllenWLynch @.>; Comment @.> Subject: Re: [cistrome/MIRA] Question for multiple datasets (Issue #19)

Hi Allen!

Our samples were collected in separate batches, so we'd like to process them individually as well as together to see a trajectory. We have processed the samples already using Seurat/Signac so we know the minimal technical variation across samples, but we'd like to use mira since it would be best for addressing our specific hypotheses.

Thanks! -- Justine

— Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcistrome%2FMIRA%2Fissues%2F19%23issuecomment-1465295125&data=05%7C01%7C%7Cc2b1a9adce0a4bb5fa7508db233a7f18%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638142506368811097%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qNv6%2BDtWe5tr4dQh8t5mHTTNNZPxK2C%2BVWXy6cGRC0w%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAE43JPHBE7EC6K4R2UIDD2TW3Y7RVANCNFSM6AAAAAAVRNSWRY&data=05%7C01%7C%7Cc2b1a9adce0a4bb5fa7508db233a7f18%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638142506368967336%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oyE0bbx0yUQam%2FyEvNboOrcEgmCI5JOf19ZO0b4UMXU%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

jjacobi3 commented 1 year ago

Hi Allen,

Thanks for the advice! The technical effects are minimal so I'd like to try and apply MIRA as if it were a single dataset, as a first attempt.

Do you recommend merging all the samples together prior to using MIRA or is there a workflow that you recommend within the preprocessing steps of MIRA?

Thanks again!

AllenWLynch commented 1 year ago

In this case,

I would recommend merging the samples together before running MIRA. For expression data, this is easy because you can just merge on gene features and select highly variable genes across datasets. For accessibility data it can be more challenging since you have to call some standardized peakset.

I would recommend piling-up your various datasets into a big-wig file and calling peaks using MACS!

AL

cistrome / MIRA

Question for multiple datasets #19