BrikerMan / Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
http://kashgari.readthedocs.io/
Apache License 2.0
2.39k stars 441 forks source link

[Proposal] Migrate keras to tf.keras #77

Closed BrikerMan closed 5 years ago

BrikerMan commented 5 years ago

I am proposing to change keras to tf.keras for better performance, better serving and add TPU support. Maybe we should re-write the whole project, clean up the code, add missing documents and so on.

Here are the features I am planning to add.

  1. Multi-GPU/TPU support
  2. Export model for Tensorflow Serving
  3. Fine-tune Ability for W2V and BERT
BrikerMan commented 5 years ago

@alexwwang @HaoyuHu

alexwwang commented 5 years ago

I am not familiar with tf.keras yet so I read the document quickly. After that I think we should first get clear with some questions and make an agreement:

So, what about keep the current one and make a new edition with tf.keras to give maximum choice freedom to users?

BrikerMan commented 5 years ago

I think we should agree on those two questions before any change, too.

  1. Who would use kashgari and their purpose?
    • Academic users Experiments to prove their hypothesis
    • NLP beginners Learn how to build an NLP project with production level code quality
    • NLP developers Build a production level classification/labeling model
  2. What's the purpose of the quality of this project? Production level or toy level?
    • Product level is the vision. I think we should build a well organized and documented framework with high performance.

So, I think migrating to tf.keras is a good idea.

The only thing tf.keras not good at is switch keras backend to CNTK or Theano. But I think 99% of Kashgari's users don't need this feature. So I recommend maintaining one edition with tf.keras.

alexwwang commented 5 years ago

Mostly I agree with you, especially on your thoughts about users and vision of this project, and advantages of TF.keras on technical level.

The only one thing I am not sure is if there're few nlp users that use MXnet/CNTK as their main backend framework. So at least I suggest keeping this edition which uses keras as backbone even we don't maintain it in near future and the period of new TF.keras edition development. Maybe in future there would be someone else who is interested in continuing develop it for those two groups of users.

That's my opinion, say, never lock a door completely. I think this is good to open source communities.

How do you think?

On Thu, 25 Apr 2019 at 19:05, Eliyar Eziz notifications@github.com wrote:

I think we should agree on those two questions before any change, too.

  1. Who would use kashgari and their purpose?

    • Academic users Experiments to prove their hypothesis
    • NLP beginners Learn how to build an NLP project with production level code quality
    • NLP developers Build a production level classification/labeling model
  2. What's the purpose of the quality of this project? Production level or toy level?

    • Product level is the vision. I think we should build a well organized and documented framework with high performance.

So, I think migrating to tf.keras is a good idea.

  • tf.keras provides the same API with keras
  • tf.keras provides better performance, easy to upgrade to tf2.0, easy to deploy the trained model on various platforms
  • Better training policy support

The only thing tf.keras not good at is switch keras backend to CNTK or Theano. But I think 99% of Kashgari's users don't need this feature. So I recommend maintaining one edition with tf.keras.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/BrikerMan/Kashgari/issues/77#issuecomment-486627696, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGRFKS62J5PATIWBREA6XDPSGF6HANCNFSM4HIAZOGQ .

BrikerMan commented 5 years ago

Of course, will keep the keras version as the legacy version for those who want to use MXnet/CNTK.

Let's finish #76 and #69 and keep a legacy version (v0.1 ~ v0.3), Then start working on the new version from v0.4 with the tf.keras.

alexwwang commented 5 years ago

Ok. I've no problem.

On Thu, 25 Apr 2019 at 19:29, Eliyar Eziz notifications@github.com wrote:

Of course, will keep the keras version as the legacy version for those who want to use MXnet/CNTK.

Let's finish #76 https://github.com/BrikerMan/Kashgari/pull/76 and #69 https://github.com/BrikerMan/Kashgari/pull/69 and keep a legacy version (v0.1 ~ v0.3), Then start working on the new version from v0.4 with the tf.keras.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/BrikerMan/Kashgari/issues/77#issuecomment-486634350, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGRFKXCT7KPVKMJHBGJ57TPSGIY7ANCNFSM4HIAZOGQ .

haoyuhu commented 5 years ago

In the future, tf.keras will be more closely integrated with Tensorflow2. The improvement of computing performance and the convenient Tensorflow serving solution are of great significance to the engineering and landing of algorithm research. We only need to keep the current version, and create a new project based on tf.keras. :D

alexwwang commented 5 years ago

I assume it would be easy to transfer most part of current models and layers code to support tf.keras justy by just modifying their import declaration sector.

Am I right?

BrikerMan commented 5 years ago

I assume it would be easy to transfer most part of current models and layers code to support tf.keras justy by just modifying their import declaration sector.

Am I right?

It will be easy for most of the model. But we need to wait for keras-bert and keras-gpt-2 to support tf.keras.

Actually, I am hoping to fully rewrite && clean up our code, change model save function to this https://www.tensorflow.org/alpha/guide/saved_model .

alexwwang commented 5 years ago

Ok, I will check it later.

On Thu, 9 May 2019 at 19:27, Eliyar Eziz notifications@github.com wrote:

I assume it would be easy to transfer most part of current models and layers code to support tf.keras justy by just modifying their import declaration sector.

Am I right?

It will be easy for most of the model. But we need to wait for keras-bert and keras-gpt-2 to support tf.keras.

Actually, I am hoping to fully rewrite && clean up our code, change model save function to this https://www.tensorflow.org/alpha/guide/saved_model .

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/BrikerMan/Kashgari/issues/77#issuecomment-490864841, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGRFKSMCBEC7T36NRIY4SDPUQDARANCNFSM4HIAZOGQ .

haoyuhu commented 5 years ago

https://github.com/HighCWu/keras-bert-tpu This project is a fork of CyberZHG/keras_bert which supports Keras BERT on TPU. I have tried it and found it's not perfectly compatible with the ours.

BrikerMan commented 5 years ago

@HaoyuHu I have communicated with CyberZHG. He is planning to support tf.keras, I think we just need to focus on how to rewrite and clean up our code. Here is the issue, https://github.com/CyberZHG/keras-ordered-neurons/issues/1

haoyuhu commented 5 years ago

Interesting. That is fine if he is going to support tf.keras. BTW: Instead of keras.utils.multi_gpu_model(), can we consider using distribution strategies(such as MirroredStrategy) which are recommended in TF2.0 to support Multi-GPU? https://github.com/tensorflow/tensorflow/issues/26245#issuecomment-474611589

BrikerMan commented 5 years ago

@HaoyuHu Yes, distribution strategies is a more elegant way to support multi-gpu/multi-device/ TPU training. We definitely should do that.

BrikerMan commented 5 years ago

I have tried tf.keras for a classifier model, tf2.0 train tooks 9s for each epoch while tf1.13 need 68s. Prediction is 10x fast with tf2.0, too.

haoyuhu commented 5 years ago

Exciting! For my use case, about 8 times faster.

alexwwang commented 5 years ago

Seems awesome!

On Mon, May 13, 2019, 01:01 hahahu notifications@github.com wrote:

Exciting! For my use case, about 8 times faster.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BrikerMan/Kashgari/issues/77#issuecomment-491611921, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGRFKSFEZSCJ6AOA3ZT4U3PVBELZANCNFSM4HIAZOGQ .

BrikerMan commented 5 years ago

Added a new branch tf.keras-version for tf.keras version. Planning to start from beginning.

haoyuhu commented 5 years ago

Good to go!

alexwwang commented 5 years ago

Roll on.

@BrikerMan Do you have any idea on structural optimizing of the new edition? Or we just focus on save/load process improving and some other refactors?

BrikerMan commented 5 years ago
  1. Clean up our code, add document with sphinx.
  2. Add a preprocessor class to build dict for tokens and labels, which will able to dump index and numerized data to pickle file for reuse. It will be very handy when we need to compare several models' performance.
  3. Set sequence length to None, Allow various length input.
  4. Save the model as tf.saved_model format.
BrikerMan commented 5 years ago

I think the training process for the professional user should like this.

x, y = Corpus.load_data()

embedding = XXEmbedding()
# Save numerized input output(without padding) data and dict to cache_path.
embedding.build_on_corpus(x, y, cache_path='./cached_data')

classifier = CNNClassifier(embedding)
classifier.build_model(cached_data_path)
# Change model to keras_model
keras_model = classifier.keras_model

classifier.fit()

And for beginners

x, y = Corpus.load_data()

embedding = XXEmbedding()

classifier = CNNClassifier(embedding)
classifier.fit(x, y)
BrikerMan commented 5 years ago

Added document to readthedocs.io, here is the link https://kashgari.readthedocs.io/en/tf.keras-version

BrikerMan commented 5 years ago

I have added a pre-processor class, added document, and test-cases. Feel free to check it out and tell me if anything needs to change. Let's make Kashgari better.

alexwwang commented 5 years ago

Two thoughts by far:

BrikerMan commented 5 years ago

Two thoughts by far:

  • For benchmarking models, since we've decided building kash on tf.keras, maybe we could take advantage of tf.server for models benchmarking in the preprocessor class or make it a helper/utility. How does it sound like?
  • Refer to sequence length, my opinion is it would be safer that as a default value to set to 90% pctg of raw sequences length. As long as throw out a warning of using default value and complete the document, users may aware this rule. How do you think?

Yea. I agree with both Ideas. As far as sequence length as None. Maye we could make it an option for pro users.

BrikerMan commented 5 years ago

One more thing, I think we should consider adding the text-generating task as our third task, supporting classification, labeling and generating. Any suggestions about this idea. @alexwwang @HaoyuHu

BrikerMan commented 5 years ago

Just added a demo labeling model. checkt it out. https://github.com/BrikerMan/Kashgari/blob/8c560551a9bc9e8841138adac80c59c53e73ad7a/kashgari/tasks/labeling/models.py#L122

Remember to check the document too. Link for demo document

haoyuhu commented 5 years ago

I'm not familiar with text-generating task, what's your opinion @alexwwang ?

alexwwang commented 5 years ago

Unfortunately, me neither. AFAIK, text-generating task relates to GAN, of some different models with classification or labeling. I should refer to articles in topic GAN and related corpus before diving into this, but I know little details about them by far.

BrikerMan commented 5 years ago

@alexwwang I have added multi-input support for all the embedding layers for tf-keras version. Users can use the multi-input function without changing any code. And going to add a multi-output example so that user can easily change model to support their data.

alexwwang commented 5 years ago

@BrikerMan Great to hear that! Did you implement it by modifying data generator method? I cannot help to see how you do.

BrikerMan commented 5 years ago

@BrikerMan Great to hear that! Did you implement it by modifying data generator method? I cannot help to see how you do.

I had added a new class called Processor. Each model has its own embedding, and each embedding contains a processor. We could replace the processor without changing any code.

alexwwang commented 5 years ago

I see, decoupling. Wait for your push!

On Tue, 21 May 2019 at 17:49, Eliyar Eziz notifications@github.com wrote:

@BrikerMan https://github.com/BrikerMan Great to hear that! Did you implement it by modifying data generator method? I cannot help to see how you do.

I had added a new class called Processor. Each model has its own embedding, and each embedding contains a processor. We could replace the processor without changing any code.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/BrikerMan/Kashgari/issues/77?email_source=notifications&email_token=AAGRFKUAILQDW7RX53EZTC3PWPATBA5CNFSM4HIAZOG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV3MBNY#issuecomment-494321847, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGRFKR4KWKYPH66O232EMLPWPATBANCNFSM4HIAZOGQ .

BrikerMan commented 5 years ago

@alexwwang Pushed, check it out.

BrikerMan commented 5 years ago

Now we could finalize the tf.keras version's release.

Todo list:

Which part you guys want to work on? @alexwwang @HaoyuHu

BrikerMan commented 5 years ago

One more thing, should we simplify our basic models by decrease the neuron count?

alexwwang commented 5 years ago

Try best to do some part I'm familiar with. 😄

alexwwang commented 5 years ago

I think we could learn from keras-bert project, to build a middle layer to hold all backend's dependents.

BrikerMan commented 5 years ago

Try best to do some part I'm familiar with. 😄

How about add classification models, test and doc?

Here is the labeling tutorial doc

alexwwang commented 5 years ago

I could dive in from here.

On Wed, Jun 12, 2019 at 4:01 PM Eliyar Eziz notifications@github.com wrote:

Try best to do some part I'm familiar with. 😄

How about add classification models, test and doc?

Here is the labeling tutorial doc https://kashgari.readthedocs.io/en/tf.keras-version/tutorial/sequence_labeling_model.html

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BrikerMan/Kashgari/issues/77?email_source=notifications&email_token=AAGRFKV4KEMWH7LYCOOEHZDP2CUL3A5CNFSM4HIAZOG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXPSRHA#issuecomment-501164188, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGRFKSPNJTYLATXTW37Y23P2CUL3ANCNFSM4HIAZOGQ .

haoyuhu commented 5 years ago

@BrikerMan Sorry for the late reply. I'm interested in advance use-case. At the same time, I can participate in some other documentation work.

BrikerMan commented 5 years ago

@HaoyuHu Here is several advance use-case examples, maybe you could start from here, fell free to update documents about these use-cases.

BrikerMan commented 5 years ago

@alexwwang @HaoyuHu I am thinking about rename all models name for readability, Any idea?

Here are several styles I found. For example, renaming BLSTMCRFModel:

Bi_LSTM_CRF_Model
BiLSTM_CRF_Model
Bi_LSTM_CRF                  # I prefer this one
BiLSTM_CRF
haoyuhu commented 5 years ago

@HaoyuHu Here is several advance use-case examples, maybe you could start from here, fell free to update documents about these use-cases.

I will delve into these. :smile:

haoyuhu commented 5 years ago

@alexwwang @HaoyuHu I am thinking about rename all models name for readability, Any idea?

Here are several styles I found. For example, renaming BLSTMCRFModel:

Bi_LSTM_CRF_Model
BiLSTM_CRF_Model
Bi_LSTM_CRF                  # I prefer this one
BiLSTM_CRF

I prefer the last two.

alexwwang commented 5 years ago

I think the second and the last is the most concise. The question is whether _Model suffix is necessary.

BrikerMan commented 5 years ago

I think the second and the last is the most concise. The question is whether _Model suffix is necessary.

I think you are right, let’s keep the Mode. Use the Second one. Camel Case + underline between uppercase names.

alexwwang commented 5 years ago

Camel case for a structure name of a main unit in model and underline for combination of all units a model utilized and last model to indicate it's a model since similar naming rule is used for layer part but without _Model suffix.

It's my thought. Any better idea? @BrikerMan @HaoyuHu

BrikerMan commented 5 years ago

Camel case for a structure name of a main unit in model and underline for combination of all units a model utilized and last model to indicate it's a model since similar naming rule is used for layer part but without _Model suffix.

It's my thought. Any better idea? @BrikerMan @HaoyuHu

Could you give several samples?