HazyResearch / m2

Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
Apache License 2.0
520 stars 42 forks source link

Will M2-GPT be open-sourced? #7

Open yangsp5 opened 8 months ago

yangsp5 commented 8 months ago

Will M2-GPT be open-sourced? It seems interesting

DanFu09 commented 8 months ago

Yes, will be putting it up this week!

On Mon, Oct 30, 2023 at 4:43 AM yangsp5 @.***> wrote:

Will M2-GPT be open-sourced? It seems interesting

— Reply to this email directly, view it on GitHub https://github.com/HazyResearch/m2/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDDIIVJO5WNJNUBBVIQRYLYB6HGTAVCNFSM6AAAAAA6V32Z6WVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DQMJUGU4TCOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

LSinev commented 7 months ago

Thank you for your great work! Is M2-GPT open sourcing postponed?

avesus commented 5 months ago

GPT code, or it didn't happen. The extraordinary claims require extraordinary proofs. The paper is very convincing, and INCREDIBLY well written, but does causal as good as you claimed in paper? The best test would be to release the training code in Andrej Karpathy's style of minGPT/nanoGPT/llama2.c.

lhallee commented 5 months ago

@DanFu09 any update on this? I can't seem to find the checkpoints. At a minimum, I would love to see the yamls so can experiment locally. Great work putting models out with Together AI btw!

redbrain commented 5 months ago

Do you plan on releasing the weights of the causal M2 models, or just the code?

DanFu09 commented 5 months ago

Hi all, thanks for all the interest here! I’m a bit swamped with faculty apps right now but will try to get the code up in my down time.

The models were quite undertrained (5-15B tokens only) just for an initial scaling experiment so we don’t plan to release them.

On Sat, Feb 17, 2024 at 3:15 PM redbrain @.***> wrote:

Do you plan on releasing the weights of the causal M2 models, or just the code?

— Reply to this email directly, view it on GitHub https://github.com/HazyResearch/m2/issues/7#issuecomment-1950522069, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDDIIXHYUVNTGIILXY2KWLYUE2XXAVCNFSM6AAAAAA6V32Z6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJQGUZDEMBWHE . You are receiving this because you were mentioned.Message ID: @.***>

redbrain commented 3 months ago

Hello, it's been a couple weeks, just wanted to check on the status of the M2-GPT impl release?

DanFu09 commented 3 months ago

First thing on my list once the faculty interviews finish up! (One more week I promise 🤞)

(it's mostly done sitting on a private branch, just need to fix up a few more bits of configs and merge things)

redbrain commented 2 months ago

Checking in one more time, since it's been another two weeks! Is it possible to get an ETA on the M2-GPT release? (Sorry for the persistent reminders, I understand you're busy and just want to make sure this doesn't get buried under everything else.)

DanFu09 commented 2 months ago

I'm very hopeful that I'll be able to put it out this week 🤞

redbrain commented 1 month ago

Here's another two-week check-in, hopefully the last one :) How's it looking right now?

sanjayss34 commented 1 month ago

Also interested in this, would you be able to release the code?

DanFu09 commented 1 month ago

Hi :)

I uploaded a new config and some code changes to a branch of safari: https://github.com/HazyResearch/safari/tree/flashfftconv.

Please see these instructions and let me know how they work: https://github.com/HazyResearch/safari/blob/flashfftconv/experiments.md#m2-gpt . You'll have to use the old fused_fft CUDA kernel in that repo (hopefully a refactor of FlashFFTConv comes soon to make it all play nice).

If it goes well I'll start the more involved surgery to get the two repos to play nice with each other (maybe just an update of the other one and a link for now).