Kyubyong / transformer

A TensorFlow Implementation of the Transformer: Attention Is All You Need
Apache License 2.0
4.28k stars 1.3k forks source link

tf.data.Dataset.from_genertor(generator_fn,output_shapes,output_types,args(sents1,sents2)) #73

Open DrBugKiller opened 5 years ago

DrBugKiller commented 5 years ago

Here is a little question.The class of sents1 and sents2 is list of str,but in generator_fn,the class of sents1 and sents2 is list of bytes,why?Is that the effect of tf.data.Dataset.from_genertor?

ty5491003 commented 5 years ago

Yes, i met some question. Then i readed the doc of tensorflow.data.Dataset.from_generator(), but i didn't finded the reason about 'byte'. But when i search 'byte' on the page, i finded:

For large datasets (> 1 GB), this can waste memory and run into byte limits of graph serialization.

So i guess from_generator() has same mechanism.