airyhq / airy

πŸ’¬ Open Source App Framework to build streaming apps with real-time data - πŸ’Ž Build real-time data pipelines and make real-time data universally accessible - πŸ€– Join historical and real-time data in the stream to create smarter ML and AI applications. - ⚑ Standardize complex data ingestion and stream data to apps with pre-built connectors
https://airy.co/docs/core
Apache License 2.0
369 stars 44 forks source link

Parse (but don't map) API message response content #934

Closed chrismatix closed 3 years ago

chrismatix commented 3 years ago

Current state

We deliver in-/outbound message content on the APIs as a string, which is the way we store it internally. I.e. a text message looks like this:

{ //...
    "content": "{\"text\":\"on the road again\"}",
}

Desired state

We still do not want to maintain a schema for the source content. However, it would be useful to clients if we at least parse the message content to an untyped JSON object to make it easier to consume. The same message would look like this:

{ //...
    "content": {"text":"on the road again"},
}

This works for all sources that support JSON. For Twilio, this is not the case as we get messages in a url-encoded string. Therefore we can keep representing message content that we cannot parse to JSON as a string like so:

{
  "content": "ApiVersion=2010-04-01&Body=hello%20how%20are%20you%0A"
}

TODO:

bitboxer commented 3 years ago

It would be really nice if you could parse the string and extract a json object out of it πŸ™

chrismatix commented 3 years ago

@bitboxer Yes, that's the spirit of the ticket.

lucapette commented 3 years ago

What do we do with twilio?

AitorAlgorta commented 3 years ago

I think we need a special renderer for Twilio

lucapette commented 3 years ago

that's not my point. My point is that the API looses consistency:

That's really strange.

What's the advantage of this not being always a string? Which, just to be sure, it's by far the simplest approach and makes the most consistent api

chrismatix commented 3 years ago

Yes, Twilio is "funny" because inbound is URL encoded while outbound is json. So the rendered already has to make a distinction.

@lucapette makes a good point, which is why I initially went for a String in the payload. But I also get cc @steffh point that having a JSON for most cases makes the content look a bit more natural for heavy users of sources that support JSON.

paulodiniz commented 3 years ago

I think I get what @lucapette is saying. It's a principle of least surprise.

Meaning: it's a surprise that the API returns one type and then returns another type depending on the source. I agree with that. We could transform the twilio data to a json before delivering it, but that would break the whole principle of it (not change content)

bitboxer commented 3 years ago

So we have to put the parsing somewhere. Either into the frontends or into our backend. I would vote for our backend to make the API easier to consume. The target of this airy thing is devs and having a clean api to the outside is more important to me than having a cleaner code inside.

lucapette commented 3 years ago

the problem is that either you leave it intact or you don't. As soon as you open the door, we'll have opinions about the content. And we shouldn't. That's the proposed design and I buy into that. If we parse, we do have an opinion about it (and twilio is the perfect case in point)

AitorAlgorta commented 3 years ago

Yeah I understand the principle in the the backend, and I wouldn't break it. If the principle is "You send whatever you want, you will get it back in the other side" we have to be ready (and I think it is implicit in the principle) to render any type of response.

steffh commented 3 years ago

maybe i get this wrong, but if you stringify a perfect JSON object you get from a source and make it a string, in order to support the restrictions of the other weakest source you have, don't you modify it by doing so and start to have an opinion about the content not being an object but a string?

i actually think people that consume the API will not understand why we deliver an object as a string and ask the API user to parse it - given they don't care about twilio and their shortcomings (also since twilio currently makes up for 0.1% of our traffic, most people won't even have that issue).

chrismatix commented 3 years ago

Some good points made so far, I'd like to recap the reasons I see for doing the above change:

  1. We do get JSON objects from the sources, but we stringify it so that’s a mutation on the source content when we serve them like that on the content field
  2. If content is untyped then both string and object are valid return types

Therefore optimistically trying to parse them to JSON would be an effort to preserve the source content, while currently, the stringification is an implication of us storing this on an Avro field.

lucapette commented 3 years ago

maybe i get this wrong, but if you stringify a perfect JSON object you get from a source and make it a string, in order to support the restrictions of the other weakest source you have, don't you modify it by doing so and start to have an opinion about the content not being an object but a string?

yes that's a valid argument.

I had just discussed this with @chrismatix offline and if we want to stay true to our own design principle then we should return json objects when the source sent us json objects and strings when the source sent us that. What put me off is the accent on parsing. It's not about parsing, it's about being truly transparent. So I agree we should be transparent, after all the past weeks were all about supporting this design principle. To recap, I'm suggesting:

iven they don't care about twilio and their shortcomings (also since twilio currently makes up for 0.1% of our traffic, most people won't even have that issue).

In the context of the open source project, this is irrelevant.

lucapette commented 3 years ago

Therefore optimistically trying to parse them to JSON would be an effort to preserve the source content, while currently, the stringification is an implication of us storing this on an Avro field.

That's the core argument!

chrismatix commented 3 years ago

So since I am gathering from Aitor and Bodo's comments that the frontend is also okay with this change I am removing the "needs discussion" label.

paulodiniz commented 3 years ago

Following the principle of least astonishment (https://en.wikipedia.org/wiki/Principle_of_least_astonishment) , I propose an alternative:

outcome for a twilio message:

{
  "content": {
    "text":"ApiVersion=2010-04-0SmsSid=SMbc31b6419de618d65076200c546764SmsStatus=receivedSmsMessageSid="
   }
}

Advantages of this

What do you think?

chrismatix commented 3 years ago

@paulodiniz But the frontend still has to parse the text field, right? So I don't really see how this reduces astonishment.

paulodiniz commented 3 years ago

@chrismatix Because it always returns a JSON

chrismatix commented 3 years ago

But it doesn't, right? πŸ˜„ The URL encoded payload is simply moved down one level to text.

paulodiniz commented 3 years ago

@chrismatix I understand what you are saying. We still have to parse the response

What I'm saying is that the API contract is more consistent by always returning a JSON. That's it

chrismatix commented 3 years ago

I get it: Your point is that the content field would always have an object value. {"content": "twilio source string"} is also JSON (which is the second point mentioned here https://github.com/airyhq/airy/issues/934#issuecomment-776560254)

So imho moving string values inside a labeled field would add more astonishment for API consumers who want to transparently access source content, wouldn't you agree?

paulodiniz commented 3 years ago

@chrismatix I don't agree with that. Same endpoint returning two different data types depending on the source is still a break of contract in my opinion.

What do you think @lucapette ?

chrismatix commented 3 years ago

It is not a breach of contract if you say that the contract is that the field is either string or object (which is also true for all fields within the content object)

lucapette commented 3 years ago

I think there's no solution that works for all cases. This actually reflects very well on the core problem of this design: we're dealing with data we don't own and you either have an opinion about the data or you don't. I think we now know not having an opinion is an healthier trade-off and so far the fact we "break the contract" seems to be the only problem we encounter.

In theory, I agree with the point made by @paulodiniz as I also don't feel very good that we return data types that are dependent on the source. But then, again, as soon as I read the sentence I just wrote I also realize that's "by design". If we truly want to stay away from having an opinion about the content, then I would argue it must happen that different sources have different data types for the content. After all that's true anyway even if you have all JSON... they're different JSON structures so while there's an argument for "they're aal JSON objects" the argument is not so strong at closer examination as unless you know what's the source you're dealing with, you're still left with no idea of what's the data you're dealing with. And from that perspective, different JSON objects or "JSON object in one case, string to decode in some funny way in an other" leads to the same exact responsibility on the client code. To be fair, (using the practical example we're dealing with here), by changing twilio responses we'd have the worst of both words (in terms of general design): we'd have an opinion on the content for some sources and we wouldn't for others. As the consumers of the API would have to know the contract either way, you can indeed argue that by always returning JSON we're doing more work than we should and so would the consumers of the API.

tl;dr while it's counterintuitive to return different data types depending on the sources, that's in line with our philosophy of not having an opinion on the content