getindata / flink-http-connector

Http Connector for Apache Flink. Provides sources and sinks for Datastream , Table and SQL APIs.
Apache License 2.0
136 stars 39 forks source link

Adding Support for "x-www-form-urlencoded" Format in Connector #59

Open aluzet opened 1 year ago

aluzet commented 1 year ago

Hello everyone,

Thank you very much for this connector. I have started some tests with it, and it works really well. For a project, I need to extend the capabilities of this connector to support a new format. Specifically, I would like to incorporate the ability to send POST requests (via the http-sink) in the "x-www-form-urlencoded" format. While I have started examining the code, I must admit that I'm uncertain about the best approach to proceed. Could you please indicate where in the code I should make modifications or additions?

Additionally, if I succeed in making these changes and if you are interested, I would submit a pull request here.

Thank you in advance for your help.

kristoffSC commented 1 year ago

Hi @aluzet Thank you for your kind words. I'm really happy that you find this connector useful. I will be happy to see your contribution. Before we start discussion about implementation details, could you describe be a little more about format you need?

For example, having below body, send by current implementation, how it would look like for x-www-form-urlencoded format?

{
    "id": 1,
    "first_name": "Ninette",
    "last_name": "Clee",
    "gender": "Female",
    "stock": "CDZI",
    "currency": "RUB",
    "tx_date": "2021-08-24 15:22:59"
}

Cheers.

aluzet commented 1 year ago

Hi @kristoffSC, Thank you for your reactivity.

The query will look like that :

POST /myURL?topic=test&data=something&id=1 HTTP/1.1
Host: example.com
Content-Type: application/x-www-form-urlencoded
Authorization: Bearer Token

All values will be send inside url.

kristoffSC commented 1 year ago

And what would be the content of POST body in this case?

I'm about to merge https://github.com/getindata/flink-http-connector/pull/58 that adds a batch request support for HttpSink. In this improvement, one POST/PUT entry will contain an Json Array in its body with many events. Current state was that HttpSink was creating new HTTP request for every event.

I was wondering how "batch" can be expressed here.

ggekos commented 1 year ago

Hi, I am working with @aluzet so I can answer your question. In the body of the message it will be a payload with key/value.

POST /.myUrl HTTP/1.1
Host: example.com
Content-Type: application/x-www-form-urlencoded
Authorization: Bearer Token

topic=https://example.com/foo&data=the%20content

You "can" have an array like this :

POST /.myUrl HTTP/1.1
Host: example.com
Content-Type: application/x-www-form-urlencoded
Authorization: Bearer Token

message[0]=content&
message[1]=content&
message[2]=content
aluzet commented 11 months ago

Hi @kristoffSC ,

Thanks for your feedback. I just got back from vacation, and I'm getting back into this topic.

Do you have any additional information about this? We need this functionality fairly quickly. If you give me the instructions, I'd be happy to implement it.

If I come up with a good result, I'll suggest pushing the changes here, and you can review and approve them if it suits you.

Thanks in advance.

kristoffSC commented 9 months ago

Hi guys, I'm sorry for keeping you waiting.

@aluzet Feel free to submit PR I will be more than happy to review it and merge. The functionality looks like a nice feature.

For my end its still hard to grasp how this can/should work in context of the connector. Having and examples from you and @ggekos I still dont fully understand how such request should be expressed.

For example in yours example, it seems that arguments are encoded in the path but in @ggekos's example the body contains the argument? Is @ggekos 's example only for batch request? Also the body is a json format or what kind?

I can start working on it but I need to understand it better.

Also this feature seems to be strongly related with the header type. Currently headers are fully independent entity and have no impact on rest of the processing. For example setting content type header to application-json does not force to use Json format.