VinAIResearch / LFM

Official PyTorch implementation of the paper: Flow Matching in Latent Space
https://vinairesearch.github.io/LFM/
GNU Affero General Public License v3.0
176 stars 6 forks source link

Discrepancy in Timestep Ranges between LFM and DiT Repositories #12

Open denemmy opened 5 months ago

denemmy commented 5 months ago

Hello,

I am currently exploring the use of DiT model for diffusion/flow matching and have been reviewing both the LFM and DiT repositories. I have noticed a potential discrepancy in how timesteps are handled between these two implementations, specifically regarding the range of timesteps and their scaling in the timestep_embedding function.

In the DiT repository, timesteps are scaled from 0 to 1000: https://github.com/facebookresearch/DiT/blob/ed81ce2229091fd4ecc9a223645f95cf379d582b/train.py#L204

However, in the LFM repository, timesteps appear to be taken from a normalized range of 0 to 1: https://github.com/VinAIResearch/LFM/blob/601fd91f9e7a9f8e4cc178f3d6c77ea0de4ff0b9/train_flow_latent.py#L145

But both repositories use a same timestep_embedding function which does not inherently account for different ranges of input timesteps: DiT: https://github.com/facebookresearch/DiT/blob/ed81ce2229091fd4ecc9a223645f95cf379d582b/models.py#L41 LFM: https://github.com/VinAIResearch/LFM/blob/601fd91f9e7a9f8e4cc178f3d6c77ea0de4ff0b9/models/DiT.py#L44

Given the variations in timestep scaling, the resulting embeddings from each model will inherently differ, potentially affecting their performance and behavior.

Could you please clarify if this difference in the range of timesteps is intentional?

Thank you for your help and for the great work on these projects!

Best regards, Danil.