Would the training code be released?

kingnobro commented 1 year ago

Hi, I am interested in your work and want to train a new model based on my specific dataset. Would the code be released soon? Otherwise I have to implement it by myself :(

Also, could you kindly tell me any public available code of the same task? Thanks.

dpfried commented 1 year ago

Hi,

Although the code base we used is an internal version of fairseq and it won't be possible to fully release, I double-checked and Armen's release of the CM3 code in his public fork of fairseq exactly matches the objective that we used: https://github.com/ArmenAg/fairseq/commit/fdc2f7d61709e7d2458536021ea98a4f2031aa93 . Some of it is specific to fairseq, but the causal_masked_dataset.py file https://github.com/ArmenAg/fairseq/commit/fdc2f7d61709e7d2458536021ea98a4f2031aa93#diff-a27fa7e989dec569c26d7303197cc14280ab08b831b969341a27c96d6f7dbdec has the token masking procedure which should be portable to other frameworks too.

kingnobro commented 1 year ago

Thanks a lot!!! I think this causal_masked_dataset.py is what I need. :)

dpfried / incoder

Would the training code be released? #14