Azure-Samples / ms-identity-python-webapp

A Python web application calling Microsoft graph that is secured using the Microsoft identity platform
MIT License
291 stars 138 forks source link

getAtoken 502 Bad Gateway error after updating python flask app #34

Closed rjouhann closed 4 years ago

rjouhann commented 4 years ago

Hello,

I have developed a web app using python flask and integrated it with MS azure identity for authentication using the examples. Works great!

However, sometime, when a make an update to my web app (running in a container in a k8s cluster), my users are getting a 502 Bad Gateway error. When they clear the browser cache, issue is resolved and things are working fine.

image

Does anyone have an idea on how to prevent such error?

Thanks for your help, Romain

rayluo commented 4 years ago

This is the first time we hear this kind of report. We currently have more questions than answers to it.

Was it reproducible? Was "running in a container in a k8s cluster" a (only?) factor?

What exactly did you mean by "when a make an update to my web app"? Were you changing any part of this sample?

When the error occurred, did the entire flask of web app become unresponsive (thus the 502 Bad Gateway error), or was it just when the end user trying to sign in?

rjouhann commented 4 years ago

Thanks @rayluo for the quick reply!

Yes this is reproducible every time I push an update on my web app.

I don't think the fact the app was running in a container a factor but I wanted to mention it.

My web app is written in python flask, when I said every time I was chancing the app, I meant changing part of the web app to add a feature or fix a bug. I actually never really changed the sample I used to do the authentication part with Microsoft Azure identity.

The error occurs after the user login, so after the user gets authenticated (on the /getAtoken). After clearing up the browser cache, and re-authenticate, things work fine.

Here is a sample of my web app:

from flask import Flask, render_template, Response, session, request, redirect, url_for
from flask_session import Session  # https://pythonhosted.org/Flask-Session
from os import path
from collections.abc import Iterable
from functools import wraps
import requests, json, time, requests_cache
import uuid
import msal
import sys
import os.path
import re
import socket
# used to send email
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
# used to remove InsecureRequestWarning
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
import app_config

app = Flask(__name__)
app.config.from_object(app_config)
Session(app)

headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/json',
}

requests_cache.install_cache('jira_cache', backend='sqlite', expire_after=app_config.expire_after)
# HOME PAGE
@app.route("/")
def index():
    # Azure SSO
    if not session.get("user"):
        return redirect(url_for("login"))

    in_progress_json, backlog_json = fetch_data_from_api_epics()
...
...
    return render_template("index.html", user=session["user"], content=current + current_tab + backlog + backlog_tab, version=version)

...
...
other @app.route are also defined leading to different pages
...

# AZURE SSO
@app.route("/login")
def login():
    session["state"] = str(uuid.uuid4())
    # Technically we could use empty list [] as scopes to do just sign in,
    # here we choose to also collect end user consent upfront
    auth_url = _build_auth_url(scopes=app_config.SCOPE, state=session["state"])
    return render_template("login.html", auth_url=auth_url, version=version)

@app.route("/logout")
def logout():
    session.clear()  # Wipe out user and its token cache from session
    return redirect(  # Also logout from your tenant's web session
        app_config.AUTHORITY + "/oauth2/v2.0/logout" +
        "?post_logout_redirect_uri=" + url_for("index", _external=True))

def _load_cache():
    cache = msal.SerializableTokenCache()
    if session.get("token_cache"):
        cache.deserialize(session["token_cache"])
    return cache

def _save_cache(cache):
    if cache.has_state_changed:
        session["token_cache"] = cache.serialize()

def _build_msal_app(cache=None, authority=None):
    return msal.ConfidentialClientApplication(
        app_config.CLIENT_ID, authority=authority or app_config.AUTHORITY,
        client_credential=app_config.CLIENT_SECRET, token_cache=cache)

def _build_auth_url(authority=None, scopes=None, state=None):
    return _build_msal_app(authority=authority).get_authorization_request_url(
        scopes or [],
        state=state or str(uuid.uuid4()),
        redirect_uri=url_for("authorized", _external=True))

def _get_token_from_cache(scope=None):
    cache = _load_cache()  # This web app maintains one cache per session
    cca = _build_msal_app(cache=cache)
    accounts = cca.get_accounts()
    if accounts:  # So all account(s) belong to the current signed-in user
        result = cca.acquire_token_silent(scope, account=accounts[0])
        _save_cache(cache)
        return result

@app.route(app_config.REDIRECT_PATH)  # Its absolute URL must match your app's redirect_uri set in AAD
def authorized():
    if request.args.get('state') != session.get("state"):
        return redirect(url_for("index"))  # No-OP. Goes back to Index page
    if "error" in request.args:  # Authentication/Authorization failure
        return render_template("auth_error.html", result=request.args)
    if request.args.get('code'):
        cache = _load_cache()
        result = _build_msal_app(cache=cache).acquire_token_by_authorization_code(
            request.args['code'],
            scopes=app_config.SCOPE,  # Misspelled scope would cause an HTTP 400 error here
            redirect_uri=url_for("authorized", _external=True))
        if "error" in result:
            return render_template("auth_error.html", result=result)
        session["user"] = result.get("id_token_claims")
        _save_cache(cache)
    return redirect(url_for("index"))

app.jinja_env.globals.update(_build_auth_url=_build_auth_url)  # Used in template

if __name__ == "__main__":
    # Only for debugging while developing
    app.run(host='0.0.0.0', debug=True, port=8080)
rayluo commented 4 years ago

Yes this is reproducible every time I push an update on my web app.

when I said every time I was chancing the app, I meant changing part of the web app to add a feature or fix a bug. I actually never really changed the sample I used to do the authentication part with Microsoft Azure identity.

Understood. But then would you mind, for the sake of narrowing down the possible reasons, to use our off-the-shelf sample (and perhaps changing ONLY its version string to "whatever", and changing it back, and forth...) while attempting to reproduce this issue?

In fact, I tried that myself just now, and it worked for me (TM). :-)

So I can only speculate.

rjouhann commented 4 years ago

It's a "production" app and I can't seem to repro the issue locally myself, only when the app is pushed in prod when running in the container. Next time I need to make an update to the app, let me first try to turn on flask debug mode so maybe I can capture more logging when the issue happens. Regarding the version string, do you mean changing like version=whatever or version=msal.whatever? what will this do to help narrowing down the possible reasons?

rayluo commented 4 years ago

Regarding the version string, do you mean changing like version=whatever or version=msal.whatever?

I mean changing its current value from:

    return render_template("login.html", auth_url=auth_url, version=msal.__version__)

to:

    return render_template("login.html", auth_url=auth_url, version="hello world")

and then

    return render_template("login.html", auth_url=auth_url, version="whatever string to be displayed at bottom right")

what will this do to help narrowing down the possible reasons?

Not really. I was just trying to mimic your behavior of "updating this off-the-shelf sample" while making sure that we do NOT actually changing any of its existing behavior. :-P :stuck_out_tongue:

rayluo commented 4 years ago

@rjouhann Did you find something that we can help with?

rjouhann commented 4 years ago

Hello @rayluo I did not have a chance to do the test using the off-the-shelf sample. It's a production app so I need to do it outside business hours. Let me close this thread for now and come back to you if I have reproduced it with off-the-shelf sample. Thanks for your help!! Much appreciated. Best Regards, Roman