datarevenue-berlin / OpenMLOps

MIT License
697 stars 101 forks source link

Problem following basic usage of jupyter mlflow and prefect tutorial #67

Open Pfriasf opened 2 years ago

Pfriasf commented 2 years ago

Good afternoon,

In the prefect configuration step, I get the following error:

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/tmp/ipykernel_164/2133135275.py in <module>
     42 flow_run_id = prefect_client.create_flow_run(flow_id=training_flow_id, run_name=f "run {prefect_project_name}")
     43 
---> 44 create_prefect_flow()

/tmp/ipykernel_164/2133135275.py in create_prefect_flow()
     30 storage = S3(s3_bucket)
     31 
---> 32 session_token = get_prefect_token()
     33 prefect_client = Client(api_server=prefect_url, api_token=session_token)
     34 schedule = IntervalSchedule(interval=timedelta(minutes=2))

/tmp/ipykernel_164/2133135275.py in get_prefect_token()
     14 r = requests.get(auth_url)
     15 jsn = r.json()
---> 16 action_url = jsn["methods"]["methods"]["password"]["config"]["action"]
     17 data = {"identifier": username, "password": password}
     18 headers = {"Accept": "application/json", "Content-Type": "application/json"}

KeyError: 'methods'

in the response you don't get the key "methods".

Response example

{
    "id": "bad96217-aac0-4456-8ae7-54467b4c3813",
    "type": "api",
    "expires_at": "2021-08-23T14:00:44.432803488Z",
    "issued_at": "2021-08-23T13:50:44.432803488Z",
    "request_url": "http://mlops.mydomain.com/self-service/login/api",
    "ui": {
        "action": "https://mlops.mydomain.com/.ory/kratos/public/self-service/login?flow=bad96217-aac0-4456-8ae4-54467b4c323e2",
        "method": "POST",
        "nodes": [
            {
                "type": "input",
                "group": "default",
                "attributes": {
                    "name": "csrf_token",
                    "type": "hidden",
                    "value": "",
                    "required": true,
                    "disabled": false
                },
                "messages": null,
                "meta": {}
            },
            {
                "type": "input",
                "group": "password",
                "attributes": {
                    "name": "password_identifier",
                    "type": "text",
                    "value": "",
                    "required": true,
                    "disabled": false
                },
                "messages": null,
                "meta": {
                    "label": {
                        "id": 1070004,
                        "text": "ID",
                        "type": "info"
                    }
                }
            },
            {
                "type": "input",
                "group": "password",
                "attributes": {
                    "name": "password",
                    "type": "password",
                    "required": true,
                    "disabled": false
                },
                "messages": null,
                "meta": {
                    "label": {
                        "id": 1070001,
                        "text": "Password",
                        "type": "info"
                    }
                }
            },
            {
                "type": "input",
                "group": "password",
                "attributes": {
                    "name": "method",
                    "type": "submit",
                    "value": "password",
                    "disabled": false
                },
                "messages": null,
                "meta": {
                    "label": {
                        "id": 1010001,
                        "text": "Sign in",
                        "type": "info",
                        "context": {}
                    }
                }
            }
        ]
    },
    "created_at": "2021-08-23T13:50:44.433949Z",
    "updated_at": "2021-08-23T13:50:44.433949Z",
    "forced": false
}
bernardolk commented 2 years ago

Thanks for pointing this out: We have recently upgraded our Ory Kratos module version, which has the info we need there under a different key. We will need to update the tutorial notebook code to reflect that. You can use action_url = jsn["ui"]["action"]

jonpoveda commented 2 years ago

Good afternoon @bernardolk, I'm also following this tutorial and applying the change you mentioned leads me to another error I can't figure out how to solve. I've tried multiple things like passing flow as param, creating cookies, forcing refresh, passing a value that forcing refresh returns, .. with no sucess. I'm appending the request result after just getting the correct action_url:


{
   "id":"0fcc6629-2668-4af2-8824-c30396d7b173",
   "type":"api",
   "expires_at":"2021-08-23T16:00:06.279179Z",
   "issued_at":"2021-08-23T15:50:06.279179Z",
   "request_url":"http://mlops.mydomain.com/self-service/login/api",
   "ui":{
      "action":"https://mlops.mydomain.com/.ory/kratos/public/self-service/login?flow=0fcc6629-2668-4af2-8824-c30396d7b173",
      "method":"POST",
      "nodes":[
         {
            "type":"input",
            "group":"default",
            "attributes":{
               "name":"csrf_token",
               "type":"hidden",
               "value":"",
               "required":true,
               "disabled":false
            },
            "messages":"None",
            "meta":{

            }
         },
         {
            "type":"input",
            "group":"password",
            "attributes":{
               "name":"password_identifier",
               "type":"text",
               "value":"",
               "required":true,
               "disabled":false
            },
            "messages":"None",
            "meta":{
               "label":{
                  "id":1070004,
                  "text":"ID",
                  "type":"info"
               }
            }
         },
         {
            "type":"input",
            "group":"password",
            "attributes":{
               "name":"password",
               "type":"password",
               "required":true,
               "disabled":false
            },
            "messages":"None",
            "meta":{
               "label":{
                  "id":1070001,
                  "text":"Password",
                  "type":"info"
               }
            }
         },
         {
            "type":"input",
            "group":"password",
            "attributes":{
               "name":"method",
               "type":"submit",
               "value":"password",
               "disabled":false
            },
            "messages":"None",
            "meta":{
               "label":{
                  "id":1010001,
                  "text":"Sign in",
                  "type":"info",
                  "context":{

                  }
               }
            }
         }
      ],
      "messages":[
         {
            "id":4010002,
            "text":"Could not find a strategy to log you in with. Did you fill out the form correctly?",
            "type":"error"
         }
      ]
   },
   "created_at":"2021-08-23T15:50:06.281029Z",
   "updated_at":"2021-08-23T15:50:06.281029Z",
   "forced":false
}```
bernardolk commented 2 years ago

I am sorry to hear you wasted so much time @jonpoveda! I know what your issue is. Ory also changed the way you need to send the credentials in their flow. You need to update this line: data = {"identifier": <username>, "password": <pwd>} to data = {"password_identifier": <username>, "password": <pwd>, "method": "password"} I am pretty sure those are the only changes regarding this issue. But if anything, let me know.

Pfriasf commented 2 years ago

do you know if there is also any modification in the prefect api to create the project apparently it is waiting for a tenant id

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/opt/conda/lib/python3.9/site-packages/prefect/client/client.py in _send_request(self, session, method, url, params, headers)
    374         try:
--> 375             response.raise_for_status()
    376         except requests.HTTPError as exc:

/opt/conda/lib/python3.9/site-packages/requests/models.py in raise_for_status(self)
    952         if http_error_msg:
--> 953             raise HTTPError(http_error_msg, response=self)
    954 

HTTPError: 400 Client Error: Bad Request for url: https://prefect.mlops.mydomain.com/graphql

The above exception was the direct cause of the following exception:

ClientError                               Traceback (most recent call last)
/tmp/ipykernel_166/3997178121.py in <module>
     18     flow_run_id = prefect_client.create_flow_run(flow_id=training_flow_id, run_name=f"run {prefect_project_name}")
     19 
---> 20 create_prefect_flow()

/tmp/ipykernel_166/3997178121.py in create_prefect_flow()
     14         data = fetch_data()
     15         train_model(data=data, mlflow_experiment_id=5, alpha=0.3, l1_ratio=0.3)
---> 16     prefect_client.create_project(project_name=prefect_project_name)
     17     training_flow_id = prefect_client.register(flow, project_name=prefect_project_name)
     18     flow_run_id = prefect_client.create_flow_run(flow_id=training_flow_id, run_name=f"run {prefect_project_name}")

/opt/conda/lib/python3.9/site-packages/prefect/client/client.py in create_project(self, project_name, project_description)
    969 
    970         try:
--> 971             res = self.graphql(
    972                 project_mutation,
    973                 variables=dict(

/opt/conda/lib/python3.9/site-packages/prefect/client/client.py in graphql(self, query, raise_on_error, headers, variables, token, retry_on_api_error)
    296             - ClientError if there are errors raised by the GraphQL mutation
    297         """
--> 298         result = self.post(
    299             path="",
    300             server=self.api_server,

/opt/conda/lib/python3.9/site-packages/prefect/client/client.py in post(self, path, server, headers, params, token, retry_on_api_error)
    211             - dict: Dictionary representation of the request made
    212         """
--> 213         response = self._request(
    214             method="POST",
    215             path=path,

/opt/conda/lib/python3.9/site-packages/prefect/client/client.py in _request(self, method, path, params, server, headers, token, retry_on_api_error)
    457         )
    458         session.mount("https://", requests.adapters.HTTPAdapter(max_retries=retries))
--> 459         response = self._send_request(
    460             session=session, method=method, url=url, params=params, headers=headers
    461         )

/opt/conda/lib/python3.9/site-packages/prefect/client/client.py in _send_request(self, session, method, url, params, headers)
    386                         "mutation but the response could not be parsed for more details"
    387                     )
--> 388                 raise ClientError(f"{exc}\n{graphql_msg}") from exc
    389 
    390             # Server-side and non-graphql errors will be raised without modification

ClientError: 400 Client Error: Bad Request for url: https://prefect.mlops.mydomain.com/graphql

The following error messages were provided by the GraphQL server:

    INTERNAL_SERVER_ERROR: Variable "$input" got invalid value null at
        "input.tenant_id"; Expected non-nullable type UUID! not to be null.

The GraphQL query was:

    mutation($input: create_project_input!) {
            create_project(input: $input) {
                id
        }
    }

The passed variables were:

    {"input": {"name": "wine-quality-project-test", "description": null, "tenant_id": null}}

we have solved this by updating the prefect version my current version is : '0.14.12' after updating '0.15.4' which accepts tenant id this allows me to move forward but then we get error again with the versions :

Failed to load and execute Flow's environment: StorageError("An error occurred while unpickling the flow:\n  TypeError('code() takes at most 15 arguments (16 given)')\nThis may be due to one of the following version mismatches between the flow build and execution environments:\n  - prefect: (flow built with '0.15.4', currently running with '0.14.6')\n  - python: (flow built with '3.9.5', currently running with '3.7.9')")
bernardolk commented 2 years ago

The second error I have not seen before. I would recommend that you stick with the previous version since in Open MLOps module definitions we are using an image with Prefect in that version. So if you wish to upgrade the version in the notebook you would also need to upgrade the image so the pods in your cluster run a matching version.

Then, you will need to fix the first error:

Bug: prefect-server-agent pod is in a “CrashLoopBackOff” with an error of Your Prefect Server instance has no tenants. Create a tenant with prefect server create-tenant. This will raise a ClientError if we try to deploy a flow to prefect:

          The following error messages were provided by the GraphQL server:

            INTERNAL_SERVER_ERROR: Variable "$input" got invalid value null at
                 "input.tenant_id"; Expected non-nullable type UUID! not to be null.

Solution: This means that the apollo pod didn’t create a tenant. To do this you need to:

  1. Port-forward the apollo service kubectl port-forward svc/prefect-server-apollo 4200
  2. run: prefect backend server && prefect server create-tenant --name default --slug default
  3. Restart the agent pod (you can scale it down and up to restart)
Pfriasf commented 2 years ago

I think that if a tenant exists, when querying in graphql I can see its name slug and id

{
  tenant {
    slug
    id
    name
  }
}

output

{
  "data": {
    "tenant": [
      {
        "slug": "default",
        "id": "84f30d5f-7b41-36f9-a121-123456781b0928",
        "name": "default"
      }
    ]
  }
}

However, I followed your instructions: I ran the port- forward command :

kubectl -n prefect port-forward svc/prefect-server-apollo 4200

output

Forwarding from 127.0.0.1:4200 -> 4200
Forwarding from [::1]:4200 -> 4200

this is where I get stuck, maybe due to lack of knowledge.

where should I execute the command

prefect backend server && prefect server create-tenant --name default --slug default

I tried in the same terminal, but of course I can't because it's busy with the port-forward.

I also tried:

kubectl exec -n prefect svc/prefect-server-apollo -- prefect backend server && prefect server create-tenant --name default --slug default
bernardolk commented 2 years ago

@Pfriasf you should open a new terminal tab after port-forwarding, but make sure you have prefect installed locally. What is happening is that you are forwarding the port from the cluster to your local machine, then having prefect installed (via pip install) in your local machine, you can access the prefect server in your cluster from it. So, do what you did and then in a new terminal tab, after pip installing prefect, execute the commands. See if that works, please.

Pfriasf commented 2 years ago

after run pip install prefect i get the following error:

prefect: command not found

im using ubuntu

bernardolk commented 2 years ago

There seems to be a problem with your pip installation. Do you have python installed? This is the package, for reference: https://pypi.org/project/prefect/

NhatAnh commented 2 years ago

Hi, I got the same error:

ClientError: 400 Client Error: Bad Request for url: https://prefect.mlops.pixtavietnam.com/graphql

The following error messages were provided by the GraphQL server:

    INTERNAL_SERVER_ERROR: Variable "$input" got invalid value null at
        "input.tenant_id"; Expected non-nullable type UUID! not to be null.

The GraphQL query was:

    mutation($input: create_project_input!) {
            create_project(input: $input) {
                id
        }
    }

The passed variables were:

    {"input": {"name": "wine-quality-project", "description": null, "tenant_id": null}}

I followed the fix and it seems to work. the prefect-server-agent pod is running. When I check prefect dashboard, I can see there is 1 agent running. But when I run notebook, I still get the same error. How do I debug this? Thanks

pedrocwb commented 2 years ago

@NhatAnh Can you share the request you made? Please make sure that python and prefect version running in your notebook instance is the same that is running in your prefect agent.

NhatAnh commented 2 years ago

@pedrocwb I'm just doing the tutorial https://github.com/datarevenue-berlin/OpenMLOps/blob/master/tutorials/basic-usage-of-jupyter-mlflow-and-prefect.md

I use OpenMLOps-AWS terraform scripts to set it up, so I think the versions should match. I can give you the login to jupyter notebook in my setup, if that helps.

How do I check if if a tenant has been created or not?

NhatAnh commented 2 years ago

I tried to create prefect project in prefect dashboard. And in Jupyter notebook, only register the already existing project, but now I get error:

ImportError: cannot import name 'get_boto_client' from 'prefect.utilities.aws' (/opt/conda/lib/python3.9/site-packages/prefect/utilities/aws.py)
NhatAnh commented 2 years ago

How do I change python version of the jupyter notebook? It seems prefect does not work well with python 3.9

bernardolk commented 2 years ago

You would need to change the notebook image, which is specified in the singleuser option of the jupyterhub module. If you want to test with a different python version, though, you can try creating a virtual env in your machine, locally, with a different python version and install prefect client there. Should be faster. But I am not sure that's the issue, do you have any references to where they say Prefect doesn't play nice with Python 3.9?

NhatAnh commented 2 years ago

@bernardolk It says here https://docs.prefect.io/orchestration/getting-started/quick-start.html#basic-installation

omerfsen-gsnd commented 2 years ago

It seems the tenant is created. But Apollo can't connect to Graphql server:

As you can see when installing helm chart of prefect there is a job ran to create a tenant id.

kubectl logs -n prefect prefect-server-create-tenant-job-xxxx (change pod name)
Tenant created with ID: 496e7a7e-7fad-44ca-9cb1xxxxxxxxxx

Also it seems the default url for /graphql/ is internet facing URL so it is uses ory kratos but since it is not authenticated it gets 401 error (auth failed). So we logon to prefect.domain/graphql/ using kratos but UI itself is not authenticated....

vkocaman commented 2 years ago

It seems the tenant is created. But Apollo can't connect to Graphql server:

As you can see when installing helm chart of prefect there is a job ran to create a tenant id.

kubectl logs -n prefect prefect-server-create-tenant-job-xxxx (change pod name)
Tenant created with ID: 496e7a7e-7fad-44ca-9cb1xxxxxxxxxx

Also it seems the default url for /graphql/ is internet facing URL so it is uses ory kratos but since it is not authenticated it gets 401 error (auth failed). So we logon to prefect.domain/graphql/ using kratos but UI itself is not authenticated....

@bernardolk can you shed some light on this issue ?

omerfsen-gsnd commented 2 years ago

Also one thing i found out that you always download latest version:

https://github.com/datarevenue-berlin/OpenMLOps/blob/master/modules/prefect-server/variables.tf#L19-L22

Though it is not for prefect/server docker image i have seen that image had an issue and i switched it to 0.15.4 version manually for tenant creation images (that is prefectVersionTag) as defined here:

https://github.com/PrefectHQ/server/blob/master/helm/prefect-server/values.yaml#L13-L17

Also Server Tags are different https://github.com/PrefectHQ/server/tags and is used with helm chart version.

https://github.com/PrefectHQ/server/blob/master/helm/prefect-server/values.yaml#L3-L11

I Also updated jupyterhub with a new docker image to install 0.15.4 of prefect python packages by updating

https://github.com/datarevenue-berlin/OpenMLOps/blob/master/docker/openmlops-notebook/Dockerfile#L4

but still getting same error. Any update greatly appreciated!

bernardolk commented 2 years ago

Alright, so @vkocaman you are getting 401 status code when prefect tries to access the /graphql URL? You should make sure that you are doing the auth steps correctly: see if the get_prefect_token function is returning a token for you and that you are using it to instantiate the Prefect client. Can you check and get back to me? I did not understand what you mean with: "but UI itself is not authenticated...." What UI are you talking about?

@omerfsen-gsnd I will make sure to fix the Prefect version, thanks.