VertaAI / modeldb

Open Source ML Model Versioning, Metadata, and Experiment Management
Apache License 2.0
1.7k stars 285 forks source link

Minio support for ModelDB #870

Open Atharex opened 4 years ago

Atharex commented 4 years ago

Can the S3 storage adapter support a Minio backend?

conradoverta commented 4 years ago

Hi, @Atharex!

Currently the artifacts go directly to S3 via signed URLs. To my knowledge, Minio supports such calls, so it should work out of the box, but we have never tested against it. Are you getting some specific error? Maybe we can help figure out what's going on.

Atharex commented 4 years ago

Hi, @conradoverta!

Probably there are not many changes needed for it. Could be I'm missing something in the configuration or there is no capability yet to specify a custom endpoint in the S3 configuration (like a local Minio installation).

I've got the S3 artifact store type in my config.yaml configured like this:

artifactStoreConfig:
  artifactStoreType: S3
  S3:
    cloudAccessKey: {{ minio_access_key }}
    cloudSecretKey: {{ minio_secret_key }}
    cloudBucketName: {{ modeldb_minio_bucket }}

And I get the following error: error: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 83474DE39F314335; S3 Extended Request ID: wq4SSxhJMqBpyR+TgtoK3TCRLXylajG+x7iuCuOoOS8RP6XJIU5UI1WzViU9u8WR06qb054PWn8=)

So it seems that ModelDB tries to use those credentials to save the data into AWS, instead of my local Minio installation. Is there a way to configure the endpoint for the S3 calls?

conradoverta commented 4 years ago

Oh, that is a fair point. I don't think we have any configuration for the custom endpoint. It should be easy to add a configuration and pass it around, but we don't have a Minio setup currently to test.

Would you be willing to contribute a PR with that new configuration? We'd be happy to point you to useful information for this. Otherwise, I need to discuss with the team and put this in one of our coming sprints.

Atharex commented 4 years ago

OK, I guess I could give it a try :)

Send me the information you have and I'll see what I can do.

ravishetye commented 4 years ago

@Atharex : I believe modifying https://github.com/VertaAI/modeldb/blob/master/backend/src/main/java/ai/verta/modeldb/artifactStore/storageservice/S3Service.java#L34-L51 should get you unblocked. If it does n't, it will be helpful for me if you can share a few more lines from the stack trace.

Atharex commented 4 years ago

@ravishetye @conradoverta

I've started from where you pointed me out and I got a working example up and running for my Minio installation. I was able to log datasets into Minio successfully with it. Now I also opened the pull request (#889) with my proposed changes.

The changes also support setting the config:

      artifactStoreType: S3
      S3:
        cloudAccessKey: {{ minio_access }}
        cloudSecretKey: {{ minio_secret }}
        cloudBucketName: {{ modeldb_minio_bucket }}
        minioEndpoint: {{ minio_endpoint }}
conradoverta commented 4 years ago

Awesome! That was fast =) We'll take a look tomorrow.

ravishetye commented 4 years ago

Thanks @Atharex for the request and the fix. Could you close the ticket if things are functional for you.

Atharex commented 4 years ago

My pleasure @ravishetye :)

I would rather keep this ticket still open, as the support is not yet 100% (because of the still needed changes in the DB artifact storage path). You can show me where the changes should be made, but I cannot guarantee I will have time for another pull request in the near future :/

Atharex commented 4 years ago

@ravishetye I got some time to take another look at this. Can someone from your side point out to me the code, which is creating the frontend links?

Atharex commented 4 years ago

@ravishetye I see you guys are doing loads of refactoring on the codebase. I presume you are planning for a new release, where Minio support will already be completed by someone from your side?

conradoverta commented 4 years ago

Hi, @Atharex! Could you clarify what you mean by links? I might be missing something here.

Atharex commented 3 years ago

Might have been misled... I thought the DB stores direct links to the artifacts, which the frontend uses for downloads. I've tried a build directly from the master branch now to try and debug my problem.

I install ModelDB with this config:

    artifactStoreConfig:
      artifactStoreType: S3
      S3:
        cloudAccessKey: [my-access-key]
        cloudSecretKey: [my-secret-key]
        cloudBucketName: modeldb-bucket
        minioEndpoint: http://minio-storage.minio.svc.cluster.local:9000

Then I followed this example: https://github.com/VertaAI/modeldb/blob/master/client/workflows/demos/census-end-to-end-local-data-example.ipynb

This is my postgres DB output when I tried your latest modeldb version (initially thought the column artifacts stores the full S3 signed URLs of the artifacts).

select * from artifact;

10 |             4 | ExperimentRunEntity | artifacts  | json               | model_api.json   |                                      | 0c212b8fcd36072a29fb2e91e34a28e17a6504f28ec7fb2e9f54a83656c196d6/model_api.json     | f         |      
         | 75f807db-c6fc-462d-9438-e39f0b0d7ee0 |            | s3://modeldb-bucket/0c212b8fcd36072a29fb2e91e34a28e17a6504f28ec7fb2e9f54a83656c196d6/model_api.json     | 7763c9d7-be7c-4b36-be09-3c1a40e68537 | t
  4 |             4 | ExperimentRunEntity | artifacts  | zip                | custom_modules   |                                      | 5f95561f29a9f81f637fa50237d3729542b45c76ac47018b56dbfb16b277b37c/custom_modules.zip | f         |      
         | c2d12f87-2529-45be-bb8b-84828b4f35d1 |            | s3://modeldb-bucket/5f95561f29a9f81f637fa50237d3729542b45c76ac47018b56dbfb16b277b37c/custom_modules.zip | 61a7315e-d608-4e42-aab2-a207954fdb6f | t
...

The URL request (seen in the network analyzer of the browser) when I click on the download artifact button in the ModelDB web UI seems correct: GET http://minio-storage.minio.svc.cluster.local:9000/modeldb-bucket/0c212b8fcd36072a29fb2e91e34a28e17a6504f28ec7fb2e9f54a83656c196d6/model_api.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20201102T103803Z&X-Amz-SignedHeaders=host&X-Amz-Expires=299&X-Amz-Credential=[my-credential]/20201102/us-east-1/s3/aws4_request&X-Amz-Signature=8e1ce4a94757d3d9d4a40be37629cca4a791c882125e78a75483bc0ce3224b33

When I look up my local Minio instance, I see the artifacts correctly stored there and I can download them directly: [my-minio-url]/modeldb-bucket/0c212b8fcd36072a29fb2e91e34a28e17a6504f28ec7fb2e9f54a83656c196d6/model_api.json

Even "docker exec-ing" into the backend container and fetching the artifact links from there works. But somehow when I try to download that same file from the web UI I get an error message:

b1edc8f80de6c050e00debb2e3b401f15bec77650351f433923f61a85490a34c/custom_modules.zip
Error in downloading file: Something went wrong!

The webapp log seems fine...

/api/v1/modeldb/experiment-run/getUrlForArtifact
Requesting /api/v1/modeldb/experiment-run/getUrlForArtifact
Returning 200 OK; 433b sent

Also the modeldb-backend logs don't look suspicious

{"thread":"grpc-default-executor-6","level":"INFO","loggerName":"ai.verta.modeldb.ModelDBAuthInterceptor","message":"methodName: ai.verta.modeldb.ExperimentRunService/getUrlForArtifact","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","instant":{"epochSecond":1604316775,"nanoOfSecond":195000000},"threadId":455,"threadPriority":5,"hostName":"modeldb-backend-0","kubernetes.podIP":""}
{"thread":"grpc-default-executor-6","level":"DEBUG","loggerName":"ai.verta.modeldb.experimentRun.ExperimentRunDAORdbImpl","message":"Got ProjectId by ExperimentRunId ","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","instant":{"epochSecond":1604316775,"nanoOfSecond":215000000},"threadId":455,"threadPriority":5,"hostName":"modeldb-backend-0","kubernetes.podIP":""}

But now I'm out of ideas how to further investigate... Where I can get more debug information? Why would only the frontend get problems downloading the artifact, when all other approaches work?

conradoverta commented 3 years ago

Is http://minio-storage.minio.svc.cluster.local:9000 the same as [my-minio-url]?

My current suspicion is that you have different DNS resolution for things running in the cluster than when you access from your other machine. What happens is that the webapp tries to fetch the URL http://minio-storage.minio.svc.cluster.local:9000/... since that's the URL that ModelDB is aware of.

Could you verify if you can resolve that hostname? You can usually do dig minio-storage.minio.svc.cluster.local or ping minio-storage.minio.svc.cluster.local, depending on your setup.

Atharex commented 3 years ago

No [my-minio-url] is not http://minio-storage.minio.svc.cluster.local:9000 That is the URL to the web UI of my minio instance, which is reachable outside of my kubernetes cluster.

Though that external URL should not be used by ModelDB at all, since all of it's traffic is happening inside of the kubernetes cluster, where it has access to the http://minio-storage.minio.svc.cluster.local:9000 service (I presume this config at installation time is used by both backend and frontend services). Also as I mentioned, if I go into the model-backend container and download the generated URL of the artifact, it works fine and also DNS resolution inside that container with nslookup minio-storage.minio.svc.cluster.local works correctly.

conradoverta commented 3 years ago

The problem here seems to be that ModelDB and your browser are seeing different hostnames for the same system. So when ModelDB asks minio for the link to the artifact, the link comes back with ModelDB's hostname perspective. When the backend sends to the webapp, the webapp tries to make the request and it fails because it's a different name.

Would you mind configuring ModelDB to use the same hostname you use internally?

Atharex commented 3 years ago

Aha, I see your point!

I thought that GET request I see in the traffic analyzer happens on the web app side, (the web app transfers the file from the artifact storage and then let's me download that cached copy), but it actually gives me a direct link to the storage from it's internally resolved DNS address http://minio-storage.minio.svc.cluster.local:9000

where on the user side I want the externally defined DNS address: https://minio.my-own-domain.net

Got confused because deleting an artifact did not throw an error (later realized it's because the webapp invokes it's REST API to perform the step (e.g. /api/v1/modeldb/experiment-run/deleteArtifact {"id":"8c248b70-f001-452e-8ed0-9d3616eb4e81","key":"model_api.json"})

With this it deletes the entry from ModelDB, but leaves the artifact in MinIO intact (guess that is so by design also with other artifact stores? Or should the delete also happen inside the store?)

I guess some URL rewriting would need to take place to correctly resolve address handling on the web UI for this particular use-case (an external storage service, which has both an internal (cluster) and external (ingress) DNS name). Maybe an optional "AlternativeStoreURL" parameter supplied in the ModelDB configuration file to rewrite the generated links on the webapp side?

Just a thought... Not sure how other projects handle similar situations. Configuring ModelDB to the external name might not be easy, as there is a port in the internal service name and I would not be able to CNAME an external entry onto an internal address with a port, if I reconfigured my internal kubernetes DNS resolver.

conradoverta commented 3 years ago

We use the direct link because it's usually much faster (since their services are built for big downloads and uploads). I think adding an alternative base makes sense to me to simplify the process. Usually we handle this by adding the CNAME entries in the right place, but it might be a high barrier to use.

If we pointed you to the right places for the change, would you be willing to contribute a PR with support for this feature? It would be greatly appreciated!

Atharex commented 3 years ago

Sure, I'd go for it! This feature would help me out nicely.

conradoverta commented 3 years ago

Great!

@ad-47 @ravishetye could you share some pointers on how we could add a config field AlternativeStoreURL that would replace the base url for artifacts? The context is that the user browser and ModelDB need to see different hostnames for the minio endpoint.

ravishetye commented 3 years ago

@Atharex Would setting the minio endpoint to https://minio.my-own-domain.net work and not require more code change?

Atharex commented 3 years ago

Sadly no. There is a port in my service name and I cannot get DNS to resolve https://minio.my-own-domain.net to the internal address http://minio-storage.minio.svc.cluster.local:9000. The ingress controller also does not enable me to rewrite response URLs (only request URLs), so having this as an optional configuration step would be easiest to solve the problem.

conradoverta commented 3 years ago

The challenge that Ravi correctly pointed out when I discussed this with him is that MDB would always use that alternative URL, even if the client was running inside the cluster. Would that be an issue for you?

samru-rai commented 3 years ago

Would be cool if ModelDB team created an example for Minio so future users can just refer to the example