Different behaviour for HTTP client for the same url

hasithaa commented 1 year ago

Consider following example.

public function main() returns error? {
    http:Client cl1 = check new ("https://random-data-api.com/api/v2/banks");
    string page1 = check cl1->/;
    io:println(page1);

    http:Client cl2 = check new ("https://random-data-api.com/api/v2/");
    string page2 = check cl2->/banks;
    io:println(page2);

    http:Client cl3 = check new ("https://www.w3schools.com/html");
    string page3 = check cl3->/;
    io:println(page3);

    http:Client cl4 = check new ("https://www.w3schools.com/");
    string page4 = check cl4->/html;
    io:println(page4);
}

Here, cl1 and cl2 give the same result. But cl3 and cl4 give different result. Tested removing leading / and with different websites too. Observed this in Update 5-RC1.

TharmiganK commented 1 year ago

First of all, the urls are not exactly the same. (We are not removing the slash after another slash since a empty path is valid according to HTTP spec)

In the above cases this is how the url is getting resolved :

public function main() returns error? {
    http:Client cl1 = check new ("https://random-data-api.com/api/v2/banks");
    string page1 = check cl1->/; // https://random-data-api.com/api/v2/banks/

    http:Client cl2 = check new ("https://random-data-api.com/api/v2/");
    string page2 = check cl2->/banks; // https://random-data-api.com/api/v2//banks

    http:Client cl3 = check new ("https://www.w3schools.com/html");
    string page3 = check cl3->/; // https://www.w3schools.com/html/

    http:Client cl4 = check new ("https://www.w3schools.com/");
    string page4 = check cl4->/html; // https://www.w3schools.com//html
}

And I checked the same urls using cURL and got the same response as our ballerina client:

$ curl -v https://www.w3schools.com/html/

> GET /html/ HTTP/2
> Host: www.w3schools.com
> user-agent: curl/7.87.0
> accept: */*
> 
< HTTP/2 200 
< age: 8614
< cache-control: Public,public
< content-security-policy: frame-ancestors 'self' https://mycourses.w3schools.com;
< content-type: text/html
< date: Fri, 07 Apr 2023 05:16:13 GMT
< expires: Fri, 07 Apr 2023 09:16:13 GMT
< last-modified: Fri, 07 Apr 2023 02:52:39 GMT
< server: ECS (sgb/C6A4)
< vary: Accept-Encoding
< x-cache: HIT
< x-content-security-policy: frame-ancestors 'self' https://mycourses.w3schools.com;
< x-powered-by: ASP.NET
< content-length: 106976
< 

<HTML document - removed for brevity>

$ curl -v https://www.w3schools.com//html

> GET //html HTTP/2
> Host: www.w3schools.com
> user-agent: curl/7.87.0
> accept: */*
> 
< HTTP/2 301 
< cache-control: public
< content-security-policy: frame-ancestors 'self' https://mycourses.w3schools.com;
< content-type: text/html; charset=UTF-8
< date: Fri, 07 Apr 2023 05:16:46 GMT
< location: http://www.w3schools.com/html/
< server: Microsoft-IIS/10.0
< x-content-security-policy: frame-ancestors 'self' https://mycourses.w3schools.com;
< x-powered-by: ASP.NET
< content-length: 153
< 
<head><title>Document Moved</title></head>
* Connection #0 to host www.w3schools.com left intact
<body><h1>Object Moved</h1>This document may be found <a HREF="http://www.w3schools.com/html/">here</a></body>%

So I don't think this is a bug, this is the expected behaviour. (Postman also behaves similarly when the automatic follow redirect is disabled)

TharmiganK commented 1 year ago

Closing this issue as this is the expected behaviour

hasithaa commented 1 year ago

Please try with removing the leading /. It give the same result. ( I mentioned this in the original issue)

import ballerina/http;
import ballerina/io;

public function main() returns error? {
    http:Client cl1 = check new ("https://random-data-api.com/api/v2/banks");
    string page1 = check cl1->/;
    io:println(page1);

    http:Client cl2 = check new ("https://random-data-api.com/api/v2");
    string page2 = check cl2->/banks;
    io:println(page2);

    http:Client cl3 = check new ("https://www.w3schools.com/html");
    string page3 = check cl3->/;
    io:println(page3);

    http:Client cl4 = check new ("https://www.w3schools.com");
    string page4 = check cl4->/html;
    io:println(page4);
}

TharmiganK commented 1 year ago

After removing the slashes also the resolved urls are different: For page3 the requested url is https://www.w3schools.com/html/ For page 4 the requested url is https://www.w3schools.com/html

There is a trailing slash at the end of the url, so depends on the server you may get a 301 Redirect response for page 4(Mostly for web pages which have separate resource directories and files). And by default the http client won't follow redirect unless you enabled it by the config:

import ballerina/http;
import ballerina/io;

public function main() returns error? {
    ...

    http:Client cl4 = check new ("https://www.w3schools.com", followRedirects = {enabled: true});
    string page4 = check cl4->/html;
    io:println(page4);
}

If you enable redirect then you will get the same response for both requests

hasithaa commented 1 year ago

Ok, I see.

ballerina-platform / ballerina-library

Different behaviour for HTTP client for the same url #4315