CI/CD - base decision to train or not on fingerprint logic of `rasa train` itself

ArjaanBuijk commented 3 years ago

The CI/CD pipeline checks if it needs to train the model or not. It does this by checking if any of the changed files of the most recent commits require a re-training.

This is not good for two reasons:

It is not fail safe, because it could be that there are no model changes in the current commit, but there were in the previous commit, and if training during the previous commit failed, the trained model that is stored on S3 is old.
We're duplicating logic, because rasa already includes the logic to make that decision, using fingerprinting. The model.tar.gz of a trained model includes all the information that the rasa train command needs to decide to retrain or not.

It is proposed to try this approach:

When the CI/CD pipeline is at the point to train the model, it first downloads the current model from S3
It then issues the rasa train command, and relies on the build-in logic to train or not
It then uploads the model back to S3, but only if it has changed.

ArjaanBuijk commented 3 years ago

@b-quachtran , What do think about this proposal to improve the logic in the CI/CD pipeline?

b-quachtran commented 3 years ago

@ArjaanBuijk I think the above approach makes sense, I know of customer that doing something very similar in their CI pipeline for determining if model training needs to be run.

One thing that I think would work well is to make use of the rasa train --dry-run flag as a conditional check that determines if the train job should run.

b-quachtran commented 3 years ago

The return code from rasa train --dry-run determines whether model re-training is needed:

  --dry-run             If enabled, no actual training will be performed.
                        Instead, it will be determined whether a model should
                        be re-trained and this information will be printed as
                        the output. The return code is a 4-bit bitmask that
                        can also be used to determine what exactly needs to be
                        retrained: - 1 means Core needs to be retrained - 2
                        means NLU needs to be retrained - 4 means responses in
                        the domain should be updated - 8 means the training
                        was forced (--force argument is specified) (default:
                        False)

ArjaanBuijk commented 3 years ago

I tried it out, but when running the cicd pipeline, downloading the model first, and then issuing rasa train does not work. Rasa is always training it.

For now, leaving the existing logic in place.

RasaHQ / financial-demo

CI/CD - base decision to train or not on fingerprint logic of `rasa train` itself #122