calico / scBasset

Sequence-based Modeling of single-cell ATAC-seq using Convolutional Neural Networks.
Apache License 2.0
94 stars 12 forks source link

analyzing motif #5

Closed willey2020 closed 7 months ago

willey2020 commented 2 years ago

Thank you again for this great work.

I can see that providing a given motif sequence, scBasset can compute the influence of accessibility of this input motif in single cell level. Could I ask if scBasset can perform general de novo motif analysis on sequences recognized by the first convolution layer filters, like basenji and basset can do? If yes, could you give any quick guidance, Thank you!

hy395 commented 2 years ago

We didn't explore this direction specifically. The convolution tower of scBasset is the same structure as Basset, so the representational power should be similar. Our experience is that the filters learnt from first layer sometimes match with known PWM, but sometimes not. They often capture just partial motifs. Another way to do de novo motif discovery is using TFmodisco, which can be done after generating important scores. However, that needs to be run only on the cluster-level instead of single-cell level.

willey2020 commented 2 years ago

Thank you so much Han! Your answer is clear!

willey2020 commented 1 year ago

Hi Han, sorry for bothering on this question again, I am hoping to put important scores into TFmodisco to get the de novo motif, (from clusters). Could I ask for your guidance or some example code to make that step work? Thank you so much!

Gavin-Lijy commented 1 year ago

Hi Han, sorry for bothering on this question again, I am hoping to put important scores into TFmodisco to get the de novo motif, (from clusters). Could I ask for your guidance or some example code to make that step work? Thank you so much!

Hi, it's exactly what I'm about to do. May I ask how about your progress?

willey2020 commented 1 year ago

Sorry very much @Gavin-Lijy and just see your post.

I am not good at deep learning and still learning(hope to deeply learn deep learning lol).. I don't have any good progress yet, but I run through the ISM code and works. Hope Han and you can provide more ideas. Thank you!

Best, willey

Hi Han, sorry for bothering on this question again, I am hoping to put important scores into TFmodisco to get the de novo motif, (from clusters). Could I ask for your guidance or some example code to make that step work? Thank you so much!

Hi, it's exactly what I'm about to do. May I ask how about your progress?

davek44 commented 1 year ago

Hi all, we don't have good to do this automatically, but we can plan to write and include it in the repository. Thanks for the suggestion.

willey2020 commented 1 year ago

Thank you so much Dr.Kelley! Honestly, your and Han's work on scBasset is amazing and insightful!

willey2020 commented 1 year ago

Dear all, as asked by @Gavin-Lijy as well, could I ask if there is any update for this question? Thank you!

hy395 commented 1 year ago

Hi @willey2020 ,

Is the question trying to compute importance score for tfmodisco? I would suggest computing ISM on a few hundred peaks of interest, then running tfmodisco on those, taking ISM score as the importance score. Since with scBasset, there are usually thousands/tens of thousands tasks. Computing importance scores (either by ISM or gradient) would be slow. We are still working on improvements, an idea is to taking an average of the cell embeddings (for cells in a cluster), and generate a new pseudo-bulk head, so it's fast to compute gradient on it.

Alternatively, you can take a look at deep learning models specifically designed for bulk ATAC (e.g. chrombpnet), and run it no pseudo-bulk data.

Best, han

willey2020 commented 1 year ago

Dear Han, Thank you so much for your answer and guidance! I will explore with your suggestions! Thank you again!

Best, Willey