Model support: GPT - Githubissues

CUNY-CL / yoyodyne

Small-vocabulary sequence-to-sequence generation with optional feature conditioning

Apache License 2.0

25 stars 15 forks source link

Model support: GPT #159

Open bonham79 opened 4 months ago

bonham79 commented 4 months ago

MIght as well set up an autoregressive decoder since T5 is on the docket. This shouldn't be too much of a hassle since the Transformer model works, but leaving as an open issue to do validation testing on.

kylebgorman commented 4 months ago

Dumb question but how is this different than the type of decoder-only LM we were talking about?

bonham79 commented 4 months ago

It's exactly that. It's just running transformer with --encoder_layers=0. Why I'm saying it shouldn't be much of a hassle (technically done already, just needs some benchmarking).

kylebgorman commented 4 months ago

I think Adam has an implementation in his fork, but hasn’t PRed it yet.

On Tue, Feb 6, 2024 at 12:06 PM Travis Bartley @.***> wrote:

It's exactly that. It's just running transformer with --encoder_layers=0. Why I'm saying it shouldn't be much of a hassle (technically done already, just needs some benchmarking).

— Reply to this email directly, view it on GitHub https://github.com/CUNY-CL/yoyodyne/issues/159#issuecomment-1930381801, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OI6F23Z4PAAEDPSCBTYSJPLBAVCNFSM6AAAAABC4KYZZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZQGM4DCOBQGE . You are receiving this because you commented.Message ID: @.***>

Adamits commented 4 months ago

Yes, though I specifically have a prefix-LM. This can be used like GPT if you just ensure the prefix is always 0. I have some currently dirt code that takes a source and target, and concatenates them and always assumes the source is the prefix for training.

I can work on a PR in the next few weeks.

bonham79 commented 4 months ago

Perfect, give me a ping when ready and i'll do some benchmarking at home.

Any issue in adding features to the prefix concat? Should allow an easy hack of doing task/multitask specific training (just make a treat a target task as a feature in training).

Adamits commented 4 months ago

Sorry, yes, the features are in the prefix too by default.