emanjavacas / weasimov

Code for the "AsiBot" project
MIT License
0 stars 0 forks source link

Fake generation from scratch by reading sampled sentences from the training data (same genre, etc...) #22

Closed emanjavacas closed 7 years ago

emanjavacas commented 7 years ago

@mikekestemont, would you have time to look into this?

mikekestemont commented 7 years ago

yes, what do you need for this? Just a function that returns a random sample from the corpus? How do we control for genre? Does this assume no context for the generation at all?

On Thu, Jun 29, 2017 at 1:00 PM, Enrique Manjavacas < notifications@github.com> wrote:

@mikekestemont https://github.com/mikekestemont, would you have time to look into this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/weasimov/issues/22#issuecomment-311933491, or mute the thread https://github.com/notifications/unsubscribe-auth/AELJL5nAP74Da15gkkHXtz67LrqfXJd7ks5sI4PbgaJpZM4OJJUp .

emanjavacas commented 7 years ago

you should check if no seed is being sent to the backend, and if so, use the model name to find out the genre and quickly sample a sentence from the appropriate corpus subset. That sentence should then be passed as seed instead of None.

mikekestemont commented 7 years ago

Ah, I see: great trick! I am almost done grading my papers and will implement this ASAP. One issue is that a finetuned model now does not explicitly store its genre or anything, so we'll have to see how we implement this.

On Thu, Jun 29, 2017 at 1:25 PM, Enrique Manjavacas < notifications@github.com> wrote:

you should check if no seed is being sent to the backend, and if so, use the model name to find out the genre and quickly sample a sentence from the appropriate corpus subset. That sentence should then be passed as seed instead of None.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/emanjavacas/weasimov/issues/22#issuecomment-311938434, or mute the thread https://github.com/notifications/unsubscribe-auth/AELJL4XQK3HcLmYcJVyeXxW1PCUI7rsbks5sI4mrgaJpZM4OJJUp .

mikekestemont commented 7 years ago

We select a random sentence from the corpus after 364b886414b50494af393bbef02f68a0c079c5a4. Filtering mechanism is added but not functional yet, because this will to be hardcoded depending on which voices we select in the end.