asyml / texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
https://asyml.io
Apache License 2.0
744 stars 118 forks source link

NNI hypertuning for Bert classification examples #324

Closed ZeyaWang closed 3 years ago

ZeyaWang commented 3 years ago

Description

This includes an NNI hypertuning for Bert classification examples with the same configuration as provided in the original hyperopt examples. Currently the example is provided based on the adl service platform, so an integration with adaptdl should be provided later.

Test

It has been tested on EKS.

codecov[bot] commented 3 years ago

Codecov Report

Merging #324 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #324   +/-   ##
=======================================
  Coverage   80.14%   80.14%           
=======================================
  Files         134      134           
  Lines       11195    11195           
=======================================
  Hits         8972     8972           
  Misses       2223     2223           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a9e79f8...137f6d1. Read the comment docs.

hunterhector commented 3 years ago

Thanks a lot for the PR. Let me know if my understanding is correct:

  1. We will not integrate NNI inside executor for now and will wait for AdatpDL - Texar integration to happen first, is that correct?
  2. Can we do some minimal testing of this example on Travis? I understand this version requires clusters for testing but can we create a local version just for basic testing?

Further, a few things would make this more open-source appropriate, basically, we have to allow the users who downloaded Texar-Pytorch to be able to run these examples:

  1. Maybe we should remove some Petuum specific settings.
  2. Some instructions/readmes on how to set up their own cluster, images, and other required environments.

I think the final outcome of this example should be that I need to be able to run it (with or without Petuum specific environments).

ZeyaWang commented 3 years ago
  1. No, we will not integrate NNI inside executor until the AdatpDL - Texar integration to happen. It is correct.
  2. I can modify the code to make it work with just nni on a local service, without using adl service. However, given running nni is enabled with a command-line with a configuration YAML file, I need to think about how to write a testing script for this case. For your other questions,
  3. I am not sure if you mean the image name or the adl service. I could remove it but some of them might be required later when the adaptdl-texar integration is finished.
  4. I can write a read me for local setup. For the cluster setup, it should be written when adaptdl-texar will be done. I can make the example to be successfully run locally at this moment.
hunterhector commented 3 years ago
  1. No, we will not integrate NNI inside executor until the AdatpDL - Texar integration to happen. It is correct.

Cool, I will leave this example as it is then.

  1. I can modify the code to make it work with just nni on a local service, without using adl service. However, given running nni is enabled with a command-line with a configuration YAML file, I need to think about how to write a testing script for this case.

Hopefully, we can adapt the test cases from AdaptDL in some way? Eventually, we want some simple form of testing here.

  1. I am not sure if you mean the image name or the adl service. I could remove it but some of them might be required later when the adaptdl-texar integration is finished.

I am trying to see how a normal user can run the examples if they cannot access the resources within Petuum. For example, how can I user obtain our image? If they need to download it somewhere or build one themselves, then we can add some instructions/pointers for them to do so.

  1. I can write a read me for local setup. For the cluster setup, it should be written when adaptdl-texar will be done. I can make the example to be successfully run locally at this moment.

OK, let's test the local version first.

ZeyaWang commented 3 years ago

I have removed all the resources that are internal to Petuum and made the current example based on a local generic version of NNI. The current example that is included is simple to be run. An instruction about how to install and how to run has been included in the readme file.