Closed Yansr3 closed 6 months ago
Yes, I would proceed with the Bayesian step, do you also have RNA-seq data?
Why the gradient tuner experiences overflows in some cases is still under investigation. Sometimes even just changing the initialization seed can lead to success.
What I do know is that the overflows are far more likely to occur when you have more features than samples - which is the case for most ATAC-seq datasets.
On Sep 11, 2023, at 6:31 PM, Yansr3 @.***> wrote:
I'm trying to train an atac topic model for my dataset, and the gradient tuning step failed for my data after several tries experienced gradient overflow. I have around 33000 cells. I followed the filtering steps in the tutorial, have the learning rate set as default (1e-3, 0.1) and randomly downsampled 100k peaks for the training.
The graph for number of reads is as follows. [386502df-6fd9-498d-b077-b851504447f8]https://user-images.githubusercontent.com/89668322/267161186-aed65813-ef52-4c36-960b-a7cbe0f92535.png
Therefore, I don't think it could be caused by high learning rate or too many features. And for outlier cells, I'm not sure if I should perform more filtering.
I'm hoping to seek some help or advice on the gradient tuning step. Or should I just move on to the bayesian step with a rough estimate of topic numbers by myself instead?
— Reply to this email directly, view it on GitHubhttps://github.com/cistrome/MIRA/issues/31, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE43JPEPBYM2D7ODZSZUGBDXZ6GM7ANCNFSM6AAAAAA4T7YW5U. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you for the information! Yes, I also have RNA-seq data, And I didn't encounter overflow when training the rna model. Is there a relationship between the topic numbers in ran and atac? If there is, I could tak a rough esitmate based on the rna model.
Yes, it depends on the system of course, but usually the # of ATAC topics is pretty similar. I will add that to the docs.
You could do a Bayesian search around the # of RNA topics (+-3) to be sure.
On Sep 12, 2023, at 4:12 PM, Yansr3 @.***> wrote:
Thank you for the information! Yes, I also have RNA-seq data, And I didn't encounter overflow when training the rna model. Is there a relationship between the topic numbers in ran and atac? If there is, I could tak a rough esitmate based on the rna model.
— Reply to this email directly, view it on GitHubhttps://github.com/cistrome/MIRA/issues/31#issuecomment-1716358655, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE43JPEHOKZVJELQQMNDADLX2C64LANCNFSM6AAAAAA4T7YW5U. You are receiving this because you commented.Message ID: @.***>
Thank you! This helps a lot.
I'm trying to train an atac topic model for my dataset, and the gradient tuning step failed for my data after several tries experienced gradient overflow. I have around 33000 cells. I followed the filtering steps in the tutorial, have the learning rate set as default (1e-3, 0.1) and randomly downsampled 100k peaks for the training.
The graph for number of reads is as follows.![386502df-6fd9-498d-b077-b851504447f8](https://github.com/cistrome/MIRA/assets/89668322/aed65813-ef52-4c36-960b-a7cbe0f92535)
Therefore, I don't think it could be caused by high learning rate or too many features. And for outlier cells, I'm not sure if I should perform more filtering.
I'm hoping to seek some help or advice on the gradient tuning step. Or should I just move on to the bayesian step with a rough estimate of topic numbers by myself instead?