arquillian / arquillian-organization

Arquillian Project Umbrella - used to gather all issues at one place by using ZenHub
http://arquillian.org
3 stars 0 forks source link

Ability to mask request content in simulation #10

Open bartoszmajsak opened 6 years ago

bartoszmajsak commented 6 years ago

Some of the information stored in the service virtualization files should not be exposed in plain text. Investigate how in Hoverfly one can mask it for tests, yet still, share those simulation files e.g. GitHub.

Hint: might be possible by using middleware.

For example, request in the simulation can contain an actual token to GH if we want to virtualize their API. This should not be shared in GH (even though when you do, GH is smart enough to detect this token and revoke it, so it is no longer valid).

{
  "data" : {
    "pairs" : [ {
      "request" : {
        "path" : {
          "exactMatch" : "/repos/bartoszmajsak-test/my-test-repo"
        },
        "method" : {
          "exactMatch" : "GET"
        },
        "destination" : {
          "exactMatch" : "api.github.com"
        },
        "scheme" : {
          "exactMatch" : "https"
        },
        "query" : {
          "exactMatch" : ""
        },
        "body" : {
          "exactMatch" : ""
        },
        "headers" : {
          "Authorization" : [ "token 2123123adaczxcasdaq2231231223" ]
        }
      },
      "response" : {
        "status" : 200,
        "body" : "sthsth",
        "encodedBody" : false,
        "templated" : false
      }
    }],
    "globalActions" : {
      "delays" : [ ]
    }
  },
  "meta" : {
    "schemaVersion" : "v3"
  }
}
lordofthejars commented 6 years ago

My idea to fix this would be:

As an example:

#1234=2123123adaczxcasdaq2231231223
"headers" : {
          "Authorization" : [ "token #1234" ]
        }

And then run hoverctl middleware --binary python --script secrets.py setting an environment variable the key to decrypt the properties file (if we think this should be encrypted)

bartoszmajsak commented 6 years ago

Thanks for your input @lordofthejars. Does it mean that for each captured interaction I would need to go and replace the corresponding key-value with proper placeholder by hand? Maybe we can make it a bit easier?

Do you have a PoC middleware which we can play around with?

lordofthejars commented 6 years ago

No PoC yet.

If we know beforehand which fields we want to encrypt we can automate it. I think that for the purpose we want to use I'll restrict to github token header.

At this point when we run in capture mode we can autogenerate everything.

In next version, we can allow the user to set which fields want to encrypt by using a configuration file.

bartoszmajsak commented 6 years ago

I was thinking about a slightly simpler approach. We could use jsonpath to define which elements we would like to encrypt and a secret to be used for this encryption. For example:

File with a definition what to mask (simple list):

headers.authorization

after applying encryption (with a key being loaded from env variable or through part of middleware cli as a flag if possible):

"headers" : {
    "Authorization" : [ "{öÆÀêáDwŒæt÷hºï"vsvçÿfóº‹3ÓEyѯæ;·³÷}Â2JŽGi/VAý" ]
}
lordofthejars commented 6 years ago

Working in this approach then.

bartoszmajsak commented 6 years ago

@lordofthejars can you update on our latest conclusion?

lordofthejars commented 6 years ago

The problem is that Hoverfly middleware does not store simulation in modify mode, and we need modify to be able to mask/unmask requests/responses and also we need to store simulations to not having to be online all the time. The solution purposes by Hoverfly guys is to create two Hoverfly proxies one in Modify and another in simulating and redirect all traffic between these two proxies. I don't like so much this solution since we are complicating so much the test for a simple use case. Hoverfly guys then told me that they are going to work on this, but when they are going to work, this is not known yet.

MatousJobanek commented 6 years ago

I'm asking myself why do we need the token in the simulation config? Correct me if I'm wrong - the request in simulation config is just for matching the request-response pair - to know which response should be returned. If I'm correct, then why we cannot have something like:

"headers" : {
    "Authorization" : ".+"
}

which means that the value can contain some regex (or maybe just "*" or "*****") saying that something should be set for authorization purposes, but not saying what exactly. This should work in case of simulation mode. In case of the capture mode, then I agree with Bartosz that it could be configured (using JSONpath) which parameters should be hidden and not stored. But I'm maybe missing something here...

bartoszmajsak commented 6 years ago

Thanks for the research in this area, let's park it for now as we found a way of using globMatch to hide tokens (but not to re-use at this point)

@MatousJobanek I think this is not that much of a killer feature to have. If you anyway use the original request to call the living service and stored request is only used to resolved stored response(e.g. to do the delta on it) I don't think we gain much from using it. Only as a precaution to not leak accidentally recorded things.

I would opt for parking it.

JohnFDavenport commented 6 years ago

Hi, Hoverfly peeps here. We are looking at enabling lifecycle hooks which might enable what you want without us having to implement data masking in Hoverfly. However, it seems like you don't have this as a high priority and we don't have other use-cases at the moment.

lordofthejars commented 6 years ago

@JohnFDavenport yes now it is parked, but we might need in future. Thanks for the update.