PeterZZQ / scDART

GNU General Public License v3.0
10 stars 2 forks source link

How many peaks should I use for the input of scATAC-seq data? #15

Open Smilenone opened 1 month ago

Smilenone commented 1 month ago

I found it very slow when I used a 30k scATACK-seq data with top 50k peaks, how many peaks should I use for the input of scATAC-seq data?

PeterZZQ commented 1 month ago

Yes, the running time of the model depends on the number of features (especially the peaks) you used in the data, because scDART builds a larger neural network when the number of peaks is larger. That is why we did some peak filtering before running the model.

To improve the running speed of the model, you can

  1. reduce the size of each mini-batch when training scDART.
  2. select the highly variable peaks and reduce the peak number
  3. Bin the closely located peaks into a larger peak and reduce the overall peak numbers.

There is no recommended number of peaks for scATAC-seq data, fewer peaks can make the model run faster but can also cause the loss of important biological information. There is definitely a trade-off and it heavily depends on the sequencing quality of your scATAC-seq data.