apple / ml-4m

4M: Massively Multimodal Masked Modeling
https://4m.epfl.ch
Apache License 2.0
1.57k stars 91 forks source link

Input masks for generation - Potential small bug. #20

Open nilsec opened 2 months ago

nilsec commented 2 months ago

Looks like there may be a small bug in the generation:

https://github.com/apple/ml-4m/blob/2db01252093c45e7a58ebe4d1efb9361df8ca716/fourm/models/generate.py#L138

The input masks for text are determined by the position of the first batch eos only but subsequently applied to all batches. Is this intentional? Looks like it's commonly used with single batch generation (in the examples) so this may have fallen through the cracks? If not I'd be curious about the intention here, otherwise happy to make a PR.

Great stuff btw, thanks for open sourcing this!