This scenario shows how to deploy a Frequently Asked Questions (FAQ) matching model as a web service to provide predictions for user questions. For this scenario, “Input Data” in the architecture diagram refers to text strings containing the user questions to match with a list of FAQs. The scenario is designed for the Scikit-Learn machine learning library for Python but can be generalized to any scenario that uses Python models to make real-time predictions.
The scenario uses a subset of Stack Overflow question data which includes original questions tagged as JavaScript, their duplicate questions, and their answers. It trains a Scikit-Learn pipeline to predict the match probability of a duplicate question with each of the original questions. These predictions are made in real time using a REST API endpoint. The application flow for this architecture is as follows:
An example app that consumes the results is included with the scenario.
The tutorial was developed on an Azure Ubuntu DSVM, which addresses the first three prerequisites.
To set up your environment to run these notebooks, please follow these steps. They setup the notebooks to use Docker and Azure seamlessly.
docker
group:
sudo usermod -a -G docker <login>
You can check you are in the docker
group by listing your account's groups using the command:
groups
You may need to log out and log back in for this change to take effect.
docker login
git clone https://github.com/Azure/MLAKSDeployment.git
conda env create -f environment.yml
source activate MLAKSDeployment
az login
az account set --subscription <Your Azure Subscription>
jupyter notebook
After following the setup instructions above, run the Jupyter notebooks in order starting with Data Prep Notebook.
To remove the conda environment created on your Linux system see here. The last Jupyter notebook also gives details on deleting Azure resources associated with this repository.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.