Closed keloemma closed 4 years ago
What do you mean generate French data? GAN-BERT is not used as a generative model, but instead as an adversarial training schema to enable an effective semi-supervised schema for text classification with BERT.
If you want to perform text classification in French, I think you can use a multilingual BERT version within our framework.
In your description of the architecture, you said this line above :
With GAN-BERT we extend the fine-tuning stage by introducing a Discriminator-Generator setting, where:
the Generator G is devoted to produce "fake" vector representations of sentences; the Discrimator D is a BERT-based classifier over k+1 categories.
So I thought that if I use your architecutre I could use the generator to produce more sample (increase my dataset) similar to my data and use the discrimator to classify them (check which exemple are really similar to my data)
The Generator is not used to create new examples and augment the dataset. It is used to improve quality of the Discriminator according to the Semi Supervised GAN (SS-GAN) schema.
GANBERT is useful in case you have (possibly a lot of) non labeled data and a (possibly small) part of this data is labeled. The adversarial learning uses the labeled data to learn the task (as in a classical supervised schema) and, at the same time, it uses the unlabeled data (together with the fake one generated by the Generator) to improve the fine tuning of BERT.
If you have only a few data (whether labeled or unlabeled) I'm afraid that GANBERT, at the moment, is not very helpful.
Hope it helps
Danilo
Hello, I would like to know if I can use ganbert to generate french data and if yes how , should i switch Bert to use FlauBert the french version ?