artefactory / one-click-mlflow

A tool to deploy a mostly serverless MLflow tracking server on a GCP project with one command
GNU Lesser General Public License v3.0
66 stars 21 forks source link

Problem with get_token() #75

Closed ucsky closed 3 years ago

ucsky commented 3 years ago

Describe the bug

I have the following error when I try to test the one-click-mlflow

(venv) ucsky@machine:~/try/one-click-mlflow$ python examples/track_experiment.py 
Enter your project ID: ofi-ai-try
Enter the name of your MLFlow experiment: test
Traceback (most recent call last):
  File "/home/ucsky/try/one-click-mlflow/examples/track_experiment.py", line 5, in <module>
    import mlflow_config
  File "/home/ucsky/try/one-click-mlflow/examples/mlflow_config.py", line 61, in <module>
    os.environ["MLFLOW_TRACKING_TOKEN"] = get_token()
  File "/home/ucsky/try/one-click-mlflow/examples/mlflow_config.py", line 17, in get_token
    token = _get_token()
  File "/home/ucsky/try/one-click-mlflow/examples/mlflow_config.py", line 35, in _get_token
    open_id_connect_token = id_token.fetch_id_token(Request(), client_id)
  File "/home/ucsky/try/one-click-mlflow/examples/venv/lib/python3.9/site-packages/google/oauth2/id_token.py", line 252, in fetch_id_token
    credentials = service_account.IDTokenCredentials.from_service_account_info(
  File "/home/ucsky/try/one-click-mlflow/examples/venv/lib/python3.9/site-packages/google/oauth2/service_account.py", line 528, in from_service_account_info
    signer = _service_account_info.from_dict(
  File "/home/ucsky/try/one-click-mlflow/examples/venv/lib/python3.9/site-packages/google/auth/_service_account_info.py", line 46, in from_dict
    missing = keys_needed.difference(six.iterkeys(data))
  File "/home/ucsky/try/one-click-mlflow/examples/venv/lib/python3.9/site-packages/six.py", line 599, in iterkeys
    return iter(d.keys(**kw))
AttributeError: 'NoneType' object has no attribute 'keys'

To Reproduce Installing with make one-click-mlflow and after

cd examples
python3 -m venv venv 
source venv/bin/activate
pip install -r requirements.txt
python track_experiment.py

Expected behavior Experiment tracking in MLFlow.

Desktop (please complete the following information):

lsb_release -a
LSB Version:    core-11.1.0ubuntu2-noarch:security-11.1.0ubuntu2-noarch
Distributor ID: Pop
Description:    Pop!_OS 21.04
Release:    21.04
Codename:   hirsute
AlexisVLRT commented 3 years ago

Thanks for your feedback!

Did you make sure you input the project ID and not the project name? Sometime they can be different. Both are available in your GCP project homepage in the "Project info" widget.

It it is indeed the right project ID, could you tell me what is in the variables client_id and trackning_uri line 34 in examples/mlflow_config.py?

Thank you!

ucsky commented 3 years ago

Hello, thanks for your help. I checked under Project info: Project name and Project Id are the same. The tracking_url is https://mlflow-dot-ofi-ai-try.ew.r.appspot.com And the client_id is 239066521639-jqg97ih5tcjksd6mholqc4sfjat4pcie.apps.googleusercontent.com'

AlexisVLRT commented 3 years ago

Je to make sure, are you sure it is in this order? https://mlflow-dot-ofi-ai-try.ew.r.appspot.com looks more like a tracking_url to me.

Are you able to reach the MLFlow frontend in your browser using this tracking uri?

@pol-defont-reaulx is looking into the issue.

ucsky commented 3 years ago

Yes you are right about the inversion between client_id and tracking_url, I made an edit to fix this.

Yes I'm able to reach the MLFlow fronted in the browser and everything look fine.

pol-defont-reaulx commented 3 years ago

Hi @ucsky I'm currently looking at your problem but I couldn't reproduce it. What seems to be strange is that it didn't asked for a new key. Did you get the key manually before and add it to your env variables or something else?

pol-defont-reaulx commented 3 years ago

I can reproduce your problem by giving an empty json file as the SA key. Could you check the SA key your using and tell me if it's the problem? I checked and the problem (if it's this one) has been "corrected" with a better error message in the new versions of google auth.

ucsky commented 3 years ago

Hi @pol-defont-reaulx , Thanks for you help. I'm not sure what you mean when you said "Did you get the key manually ..." I setup my command gcloud long time ago but I never done something specific when trying one-click-mlflow. I can see in the dashobard, under service account project that I have a key mlflow-log-pusher@ofi-ai-try.iam.gserviceaccount.com. How could I check the SA key that I'm using? Is there a gcloud command that I can use? I will pull the repo and try again.

pol-defont-reaulx commented 3 years ago

You can check with the env variable GOOGLE_APPLICATION_CREDENTIALS which is the path to the SA key your using. Normally, the script track_experiment.py is telling you if the variable is empty or not pointing to a file and asks you if you want to pull a new key (which will be store inside examples/). It's why I asked you if you get the key manually by giving a path to the env variable outside of the script.

ucsky commented 3 years ago

I checked GOOGLE_APPLICATION_CREDENTIALS. It is not empty but it is pointing on another GCP project that I'm using "quota_project_id": immo-datascience",. This is an old legacy project that I plan to deprecate because there is too many broken stuff in it. I try to make one-click-mlfow but it didn't make it until the end of the installation and give Error: Error creating Client: googleapi: Error 400: Precondition check failed..

pol-defont-reaulx commented 3 years ago

I think the problem comes from here, you should do a GOOGLE_APPLICATION_CREDENTIALS= to empty the env variable and then run again track_experiment.py to see if it was indeed the cause of the problem.

You tried the make one-click-mlfow on a new project? When does it fail?

pol-defont-reaulx commented 3 years ago

Could you please test if your problem is resolved using the branch fix-service-account-key?

ucsky commented 3 years ago

Hello @pol-defont-reaulx,

Thanks by setting GOOGLE_APPLICATION_CREDENTIALS="" I was able to run the example.

For reply to your question, with my old legacy project the error is the following.

Welcome to the GCP Mlflow deployment helper!
If everything goes according to plan, you should have an up and running secure MLFlow install on your project in about 30 minutes

What project do you want to deploy MLFlow on?

[3] immo-datascience

Please enter your numeric choice: 3

Updated property [core/project].
Setting up your GCP project...
Done

=> A dummy app engine with the name "default" will be created

What network do you want to attach to?

[1] Create new network (recommended)
[2] default
Please enter your numeric choice: 1

What contact email address should be displayed when a user trying to log in is not authorized? The address should be yours or a Cloud Identity group managed by you.

Support email address (probably yours): ucsky@yahoo.fr

Who do you want to give access to MLFlow?

[1] Add a user like jane@example.com
[2] Add a group like people@example.com
[3] Add a domain like example.com
[4] Done
Please enter your numeric choice: 1
Enter the user's address: ucsky@yahoo.fr
["user:ucsky@yahoo.fr"]

[1] Add a user like jane@example.com
[2] Add a group like people@example.com
[3] Add a domain like example.com
[4] Done
Please enter your numeric choice: 4

Remotely building mlflow server docker image
Creating temporary tarball archive of 3 file(s) totalling 2.5 KiB before compression.
Uploading tarball of [./tracking_server] to [gs://immo-datascience_cloudbuild/source/16336452550.214517-1d607343cd054cb0ae4a99b3454797d.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/immo-datascience/locations/global/builds/a0a4449a-7de9-4d49-a978-bd9dc00d450a].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/a0a598344-7de9-4d49-a978-bd9dc00d5b0a?project=4542535435].
Done

Initializing Terraform...
Done

=> A consent screen (brand) has already been configured on this project. It will be used as-is
=> No oauth client exists on this project. A new one will be created
Importing app engine service
module.mlflow.module.server.google_app_engine_application.app: Importing from ID "immo-datascience"...
module.mlflow.module.server.google_app_engine_application.app: Import prepared!
  Prepared google_app_engine_application for import
module.mlflow.module.server.google_app_engine_application.app: Refreshing state... [id=immo-datascience]

Import successful!

The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.

Deploying infrastructure...
This should take about 20 minutes, don't forget to stretch and hydrate ☕️
╷
│ Error: Error creating Client: googleapi: Error 400: Precondition check failed.
│ 
│   with module.mlflow.module.server.google_iap_client.project_client[0],
│   on modules/mlflow/server/main.tf line 177, in resource "google_iap_client" "project_client":
│  177: resource "google_iap_client" "project_client" {
│ 
╵

In order to test the modification I did make destroy and after make one-click-mlfow but now I got this error now:

Setting up your GCP project...
╷
│ Error: googleapi: Error 409: You already own this bucket. Please select another name., conflict
│ 
│   with module.bucket_backend.google_storage_bucket.this,
│   on ../modules/mlflow/artifacts/main.tf line 18, in resource "google_storage_bucket" "this":
│   18: resource "google_storage_bucket" "this" {
pol-defont-reaulx commented 3 years ago

@ucsky thanks, I close this issue and I opened a new one #79 with this problem to have only one problem per issue.