Zehui127 / Latent-DNA-Diffusion

Latent Diffusion Model for DNA Sequence Generation
MIT License
10 stars 3 forks source link

How to evaluate the model? #5

Open yangzhao1230 opened 3 days ago

yangzhao1230 commented 3 days ago

Dear Authors,

I have read your paper with great interest and I am particularly intrigued by the Fréchet Reconstruction Distance (FReD) metric introduced for assessing the quality of generated DNA samples. As mentioned in the paper, you trained an Auto-Encoder (AE) on a reference genome distinct from the training data used for generation to derive embeddings for this metric.

Could you please provide the code and the pre-trained weights for this Auto-Encoder? Having access to these resources would greatly facilitate the reproduction of your results and further application of the FReD metric in related research.

Thank you for your consideration.

Zehui127 commented 2 days ago

Hi @yangzhao1230 ,

In the first workshop paper, we have both FReD and the FID computed with the Sei Embedding Distribution Distance. For FReD, the Auto-Encoder is used, while for the Sei Embedding Distribution Distance, Sei (https://www.nature.com/articles/s41588-022-01102-2) is used for encoding the data.

I suggest you follow the refined version of our paper (https://arxiv.org/abs/2402.06079), here we notice that the Sei based FID is more proper (as the Reconstruction based evaluation really depends on the Auto-Encoder used, so for transparency it is better to rely on a well-known ckpt for evaluation). I hereby recommend you to download the checkpoint of Sei model from https://github.com/FunctionLab/sei-framework and use it to compute the FID. For the code to compute the FID, it is relatively simple, but you can refer to https://github.com/Zehui127/Latent-DNA-Diffusion/blob/b64d0334895747307a06efeb4669d5e2ae429dcf/src/utils/evaluator.py#L45 as a implementation.