facebookresearch / MemoryMosaics

Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.
Apache License 2.0
30 stars 3 forks source link

Question about the out-of distribution generalization to Simple English Wikipedia #2

Closed anhinga closed 3 months ago

anhinga commented 3 months ago

Hi,

Thanks for a super-interesting paper and for an open-source release!

I have a question about a particularly interesting result on the out-of distribution generalization from BabyStories to the Simple English Wikipedia (Figure 10). On one hand, I don't seem to see that result in this GitHub repository (it would be super cool to be able to play with that).

Also, it says at the end of page 10: "Both the transformer and the Memory Mosaic are N_b = 512 blocks deep."

Is it really an unusual ultra-deep configuration with 512 blocks, or is this just a typo? This seems to be the only place in the paper with an unusually high number of blocks.

Thanks again!

leonbottou commented 3 months ago

This is a typo. It should be N_b=12, not 512 This experiment merely consists feeding the Simple Wikipedia articles into the Nb=12 model trained on babiStories as described earlier in the paper. No retraining is involved. Just running the text generation on different data and measuring the error (quite bad in both cases.)

I hope this clarifies. We'll post a fixed paper within a couple weeks.