Gonglab-THU / SPIRED-Fitness

MIT License
8 stars 1 forks source link

About batch processing using SPIRED-Fitness #1

Closed charlesxu90 closed 2 months ago

charlesxu90 commented 3 months ago

Dear author @Gonglab-THU,

I'm trying to predict a large batch (1000) of variants using your tool. However, currently it can only be predicted one by one. Is it possible to predict it by batch?

I looked into your model, it seems that this method was designed for single sample prediction? Seems a lot of changes has to be done to enable batch processing. Is it true?

AlanYinghuiChen commented 3 months ago

Dear charlesxu90, @charlesxu90

Thank you for using SPIRED-Fitness! We are sorry about that SPIRED-Fitness’s code only supports inference with batch =1, currently. SPIRED-Fitness was designed and trained for single sample (batch = 1). Here are two possible solutions for your consideration:

(1) As mentioned above, you wish to include 1000 mutated sequences in one batch. Are these 1000 variants from the same wild-type protein? If so, you only need to input one wild-type protein sequence (batch=1). This is because, in principle, SPIRED-Fitness only requires input of one wild-type (original) sequence to predict ranking relationships of all possible single and double mutations from the wild-type (original) protein.

(2) Since SPIRED-Fitness requires the usage of ESM2 and ESM-1v during runtime, the memory consumption will be significant when the batch size is 1000, and A100 80G GPUs may not meet the memory requirements. If you only need to inference SPIRED-Fitness, you might consider running multiple scripts to sequentially perform prediction on these samples (considering SPIRED-Fitness’s fast speed).

Thank you!

charlesxu90 commented 3 months ago

Dear @AlanYinghuiChen,

I want to run it with a batch size of 100. My memory can suit the running of EMS2 and ESM-3b. However, the running of SPIRED-Fitness model is the bottleneck, which takes too long when run one by one. I hope to enable parallel running of this part.

Best regards!