Closed denisbeslic closed 2 months ago
For future runtime improvements (using multi-GPU / multi-processing), we would need to restructure the inference pipeline and dataloading. We could try using MapDataset instead of IterableDataset. However, we would need to save the processed chunks of a FASTA file in a tmp file to use MapDataset. Otherwise, we could export the trained model as .pt and perform the inference part in another faster language (C++, Rust).
We would need to restructure our IterableDataset class to use multiprocessing for prediction
https://github.com/Lightning-AI/pytorch-lightning/issues/15734 https://colab.research.google.com/drive/1OFLZnX9y5QUFNONuvFsxOizq4M-tFvk-?usp=sharing#scrollTo=dEOL7Qh9C0vM https://assets.ctfassets.net/yze1aysi0225/6j1vzFot8yll1FG6J4Ryis/7a3cbb50869da28faaedd39bdd0d58b8/Speechmatics_Dataloader_Pytorch_Ebook_2019__1_.pdf