BART : Is there a tutorial for pre-training BART on your own dataset ?

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.53k stars 6.41k forks source link

BART : Is there a tutorial for pre-training BART on your own dataset ? #2344

Closed shamanez closed 2 years ago

shamanez commented 4 years ago

Thanks!

orena1 commented 4 years ago

@myleott ? thanks

jasonwu0731 commented 4 years ago

Same question here. Appreciate if there is a guide.

tomsherborne commented 4 years ago

@shamanez @jasonwu0731 I can't confirm everything that I am trying is 100% correct but I think I've pieced together a procedure for (possibly) retraining BART on a new dataset. I'm happy to be proved wrong/improve this. I'd also appreciate FAIR endorsed guidance. The data processing mostly comes from here. I made Gists for preprocessing and training. Let me know if this is helpful or if you have improvements!

PrettyMeng commented 4 years ago

Same problem here. It would be great if there is one.

shamanez commented 4 years ago

@tomsherborne Amazing! Actually I also have taken initial steps. But still, I am not sure about in what percentage that we need to use different pretext denoising tasks.

Let's do this!

gurvindersingh commented 4 years ago

@ngoyal2707 wondering if you can provide us with the details on training BART or comment on @tomsherborne gist if they are good. As it would be good to have the readme in examples folders describing the process.

shamanez commented 4 years ago

yeah, it would be vert useful.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

stale[bot] commented 2 years ago

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!