cistrome / MIRA

Python package for analysis of multiomic single cell RNA-seq and ATAC-seq.
52 stars 7 forks source link

gradient tuner overflow #31

Closed Yansr3 closed 6 months ago

Yansr3 commented 10 months ago

I'm trying to train an atac topic model for my dataset, and the gradient tuning step failed for my data after several tries experienced gradient overflow. I have around 33000 cells. I followed the filtering steps in the tutorial, have the learning rate set as default (1e-3, 0.1) and randomly downsampled 100k peaks for the training.

The graph for number of reads is as follows. 386502df-6fd9-498d-b077-b851504447f8

Therefore, I don't think it could be caused by high learning rate or too many features. And for outlier cells, I'm not sure if I should perform more filtering.

I'm hoping to seek some help or advice on the gradient tuning step. Or should I just move on to the bayesian step with a rough estimate of topic numbers by myself instead?

AllenWLynch commented 10 months ago

Yes, I would proceed with the Bayesian step, do you also have RNA-seq data?

Why the gradient tuner experiences overflows in some cases is still under investigation. Sometimes even just changing the initialization seed can lead to success.

What I do know is that the overflows are far more likely to occur when you have more features than samples - which is the case for most ATAC-seq datasets.

On Sep 11, 2023, at 6:31 PM, Yansr3 @.***> wrote:



I'm trying to train an atac topic model for my dataset, and the gradient tuning step failed for my data after several tries experienced gradient overflow. I have around 33000 cells. I followed the filtering steps in the tutorial, have the learning rate set as default (1e-3, 0.1) and randomly downsampled 100k peaks for the training.

The graph for number of reads is as follows. [386502df-6fd9-498d-b077-b851504447f8]https://user-images.githubusercontent.com/89668322/267161186-aed65813-ef52-4c36-960b-a7cbe0f92535.png

Therefore, I don't think it could be caused by high learning rate or too many features. And for outlier cells, I'm not sure if I should perform more filtering.

I'm hoping to seek some help or advice on the gradient tuning step. Or should I just move on to the bayesian step with a rough estimate of topic numbers by myself instead?

— Reply to this email directly, view it on GitHubhttps://github.com/cistrome/MIRA/issues/31, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE43JPEPBYM2D7ODZSZUGBDXZ6GM7ANCNFSM6AAAAAA4T7YW5U. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Yansr3 commented 10 months ago

Thank you for the information! Yes, I also have RNA-seq data, And I didn't encounter overflow when training the rna model. Is there a relationship between the topic numbers in ran and atac? If there is, I could tak a rough esitmate based on the rna model.

AllenWLynch commented 10 months ago

Yes, it depends on the system of course, but usually the # of ATAC topics is pretty similar. I will add that to the docs.

You could do a Bayesian search around the # of RNA topics (+-3) to be sure.

On Sep 12, 2023, at 4:12 PM, Yansr3 @.***> wrote:



Thank you for the information! Yes, I also have RNA-seq data, And I didn't encounter overflow when training the rna model. Is there a relationship between the topic numbers in ran and atac? If there is, I could tak a rough esitmate based on the rna model.

— Reply to this email directly, view it on GitHubhttps://github.com/cistrome/MIRA/issues/31#issuecomment-1716358655, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE43JPEHOKZVJELQQMNDADLX2C64LANCNFSM6AAAAAA4T7YW5U. You are receiving this because you commented.Message ID: @.***>

Yansr3 commented 10 months ago

Thank you! This helps a lot.