MiraldiLab / maxATAC

Transcription Factor Binding Prediction from ATAC-seq and scATAC-seq with Deep Neural Networks
Apache License 2.0
25 stars 8 forks source link

Update `maxatac train` and optimize #101

Closed tacazares closed 2 years ago

tacazares commented 2 years ago

We have had some long training times with maxATAC v1. I think we are not utilizing the multi-processing as effectively as we could. We have switched our training approach after several issues with older versions of tensorflow. Related issues: #28 #47

Currently, the training times are approaching ~1 hour per epoch, where historically they have been around 20 minutes or less. This was using 16 cores and 64GB of memory.

I think that we need to remove the OrderedEnqueuer, increase the number of workers, and integrate the data generator into the SeqDataGenerator object.

I tested a version that implements the above and achieved ~13 minutes per epoch. This method needs to be validated still.

tacazares commented 2 years ago

@FaizRizvi worked on speeding up the training ordered enqueuer and enabling multiprocessing. We were able to speed up training times as reported in pull request #106 . We also made general updates to the training parser, ROIgenerators, and added documentation for training functions.