NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.82k stars 898 forks source link

Use `tf.data` API to build flexible and high performance data pipeline #829

Open luozhouyang opened 4 years ago

luozhouyang commented 4 years ago

Is your feature request related to a problem? Please describe.

We need a more flexible and powerful data pipeline when training on very large corpus.

Describe the solution you'd like

Use tf.data API to build the high performance and flexible data pipeline.

faneshion commented 3 years ago

Do you mean the tf.data can not handle large-scale dataset? Did you try the MatchZoo-py version?