ballerina-platform / ballerina-library

The Ballerina Library
https://ballerina.io/learn/api-docs/ballerina/
Apache License 2.0
137 stars 65 forks source link

Type Cast error when load testing #1657

Closed shafreenAnfar closed 2 years ago

shafreenAnfar commented 3 years ago

Description: I get this error when load testing for the below service.

import ballerina/http;

service / on new http:Listener(8090) {

    final http:Client covid19Client;
    final http:Client worldBankClient;

    function init() {
        self.covid19Client = checkpanic new ("https://disease.sh");
        self.worldBankClient = checkpanic new ("http://api.worldbank.org/v2");
    }

    resource function get stats/[string country]() returns json|error? {

        // http:Client covid19Client = check new ("https://disease.sh");
        string path1 = string `/v3/covid-19/countries/${country}`;
        CovidCountry statusByCountry = check self.covid19Client->get(path1);

        // http:Client worldBankClient = check new ("http://api.worldbank.org/v2");
        string path2 = string `/country/${country}/indicator/SP.POP.TOTL?date=2010&format=json`;
        json[] payloadArr = check self.worldBankClient->get(path2);
        CountryPopulation[] populationByCountry = check payloadArr[1].cloneWithType(); 

        decimal totalCases = statusByCountry?.cases ?: 0d;
        decimal population = <decimal>(populationByCountry[0]?.value ?: 0) / 1000000d;
        decimal totalCasesPerMillion = totalCases / population;

        json payload = {country : country, totalCasesPerMillion : totalCasesPerMillion};
        return payload;
    }
}

# Covid-19 status of the given country
public type CovidCountry record {
    # Last updated timestamp
    decimal updated?;
    # Country name
    string country?;
    # Country information
    record  { # Country Id
        decimal _id?; # Country ISO2 code
        string iso2?; # Country ISO3 code
        string iso3?; # Latitude
        decimal lat?; # Longtitude
        decimal long?; # URL for the country flag
        string flag?;}  countryInfo?;
    # Total cases
    decimal cases?;
    # Today cases
    decimal todayCases?;
    # Total deaths
    decimal deaths?;
    # Today deaths
    decimal todayDeaths?;
    # Total recovered
    decimal recovered?;
    # Today recovered
    decimal todayRecovered?;
    # Active cases
    decimal active?;
    # Critical cases
    decimal critical?;
    # Cases per one million
    decimal casesPerOneMillion?;
    # Deaths per one million
    decimal deathsPerOneMillion?;
    # Total number of Covid-19 tests administered
    decimal tests?;
    # Covid-19 tests for one million
    decimal testsPerOneMillion?;
    # Total population
    decimal population?;
    # Continent name
    string continent?;
    # One case per people
    decimal oneCasePerPeople?;
    # One death per people
    decimal oneDeathPerPeople?;
    # One test per people
    decimal oneTestPerPeople?;
    # Active cases per one million
    decimal activePerOneMillion?;
    # Recovered cases per one million
    decimal recoveredPerOneMillion?;
    # Critical cases per one million
    decimal criticalPerOneMillion?;
};

public type CountryPopulation record {
    # World bank indicator
    Indicator indicator?;
    # Country
    Country country?;
    # Date-range by year, month or quarter that scopes the result-set.
    string date?;
    # Country population
    int? value?;
};

# Data indicator
public type Indicator record {
    # Id of the indicator
    string id?;
    # Value represent by the indicator
    string value?;
};

# Represent a Country
public type Country record {
    # Country code
    string id?;
    # Country name
    string value?;
};

docker command

docker run -p 8090:8090 --cpus=0.5 --memory=350m wb-cache

client command

echo "GET http://127.0.0.1:8090/stats/LK" | vegeta attack -duration=10m -rate=40 | tee results.bin | vegeta report

error

error: client method invocation failed: {ballerina}TypeCastError cause: {ballerina}TypeCastError
shafreenAnfar commented 3 years ago

Don't see this when running without docker. This could be because of the limited resources.

ayeshLK commented 2 years ago

We did a load test on this code-sample by limiting the memory to 350m. Got following stack-traces in addition to above mention issue.

error: client method invocation failed: Cache entry from the given key: GET /v3/covid-19/countries/LK, is not available.
cause: Cache entry from the given key: GET /v3/covid-19/countries/LK, is not available.
    at ballerina.cache.3:prepareError(cache_errors.bal:25)
       ballerina:get(cache.bal:168)
       ballerina.http.2.HttpCache:get(caching_http_cache.bal:103)
       ballerina.http.2:getCachedResponse(caching_http_caching_client.bal:273)
       ballerina.http.2.HttpCachingClient:get(caching_http_caching_client.bal:163)
       ballerina.http.2.Client:processGet(http_client_endpoint.bal:196)
error: client method invocation failed: Cache entry from the given key: GET /country/LK/indicator/SP.POP.TOTL?date=2010&format=json, is not available.
cause: Cache entry from the given key: GET /country/LK/indicator/SP.POP.TOTL?date=2010&format=json, is not available.
    at ballerina.cache.3:prepareError(cache_errors.bal:25)
       ballerina:get(cache.bal:168)
       ballerina.http.2.HttpCache:get(caching_http_cache.bal:103)
       ballerina.http.2:getCachedResponse(caching_http_caching_client.bal:273)
       ballerina.http.2.HttpCachingClient:get(caching_http_caching_client.bal:163)
       ballerina.http.2.Client:processGet(http_client_endpoint.bal:196)
error: client method invocation failed: java.lang.NullPointerException
cause: java.lang.NullPointerException
    at ballerina.cache.3:externGet(cache.bal:270)
       ballerina:get(cache.bal:171)
       ballerina.http.2.HttpCache:get(caching_http_cache.bal:103)
       ballerina.http.2:getCachedResponse(caching_http_caching_client.bal:273)
       ballerina.http.2.HttpCachingClient:get(caching_http_caching_client.bal:163)
       ballerina.http.2.Client:processGet(http_client_endpoint.bal:196)

error: client method invocation failed: java.lang.NullPointerException
cause: java.lang.NullPointerException
    at ballerina.cache.3:externGet(cache.bal:270)
       ballerina:get(cache.bal:171)
       ballerina.http.2.HttpCache:get(caching_http_cache.bal:103)
       ballerina.http.2:getCachedResponse(caching_http_caching_client.bal:273)
       ballerina.http.2.HttpCachingClient:get(caching_http_caching_client.bal:163)
       ballerina.http.2.Client:processGet(http_client_endpoint.bal:196)
error: client method invocation failed: java.lang.NullPointerException
cause: java.lang.NullPointerException
    at ballerina.cache.3:externGet(cache.bal:270)
       ballerina:get(cache.bal:171)
       ballerina.http.2.HttpCache:get(caching_http_cache.bal:103)
       ballerina.http.2:getCachedResponse(caching_http_caching_client.bal:273)
       ballerina.http.2.HttpCachingClient:get(caching_http_caching_client.bal:163)
       ballerina.http.2.Client:processGet(http_client_endpoint.bal:196)
ayeshLK commented 2 years ago

We carried out a load testing by disable caching in http:Client and found no-errors. We suspect that this could be an issue with ballerina cache. Hence will move this issue to cache module.

kalaiyarasiganeshalingam commented 2 years ago

I have done a load test of this sample with slbeta6. It is worked with the following docker and load-test commands:

Docker commands: docker run -d -p 9659:9659 testhttp:latest docker run --cpus=1 --memory=350m -d -p 9659:9659 testhttp:latest docker run --memory=350m -d -p 9659:9659 testhttp:latest

Load test command: echo "GET http://localhost:9659/stats/LK" | vegeta attack -duration=10m -rate=40 | tee results.bin | vegeta report

Output:

Requests      [total, rate, throughput]         24000, 40.00, 40.00
Duration      [total, attack, wait]             10m0s, 10m0s, 5.115ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.925ms, 5.725ms, 4.949ms, 6.059ms, 6.715ms, 8.168ms, 1.009s
Bytes In      [total, mean]                     1824000, 76.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:24000  
Error Set:

Please note that I was able to reproduce the above error with the following command:

docker run --cpus=0.5 --memory=350m -d -p 9659:9659 testhttp:latest

Error:

error: client method invocation failed: {ballerina}TypeCastError
cause: {ballerina}TypeCastError
    at ballerina.http.2:externResponseGetHeader(http_response.bal:572)
       ballerina.http.2.Response:getHeader(http_response.bal:109)
       ballerina.http.2:getResponseAge(caching_response_age_calculation.bal:42)
       ballerina.http.2:isFreshResponse(caching_freshness_lifetime_calculation.bal:20)
       ballerina.http.2:getCachedResponse(caching_http_caching_client.bal:285)
       ballerina.http.2.HttpCachingClient:get(caching_http_caching_client.bal:163)
       ballerina.http.2.Client:processGet(http_client_endpoint.bal:196)

Output:

Requests      [total, rate, throughput]         20276, 40.00, 39.98
Duration      [total, attack, wait]             8m27s, 8m27s, 2.472ms
Latencies     [min, mean, 50, 90, 95, 99, max]  1.906ms, 43.55ms, 4.66ms, 51.152ms, 64.696ms, 1.61s, 2.692s
Bytes In      [total, mean]                     1540767, 75.99
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           99.95%
Status Codes  [code:count]                      200:20265  500:11  
Error Set:

Hence, it is worked with docker but it is not worked with small limited CPU resources.

We will fix it ASAP.

kalaiyarasiganeshalingam commented 2 years ago

@ayeshLK When I have disabled the caching in both above HTTP clients and run the following commands, got these errors:

Disable the cache

self.covid19Client = checkpanic new ("https://disease.sh", cache = {enabled: false});
self.worldBankClient = checkpanic new ("http://api.worldbank.org/v2", cache = {enabled: false});

Commands

bal build --cloud=docker
docker run --cpus=0.5 --memory=350m -d -p 9659:9659 testhttp:latest
echo "GET http://localhost:9659/stats/LK" | vegeta attack -duration=10m -rate=40 | tee results.bin | vegeta report

error:

java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
Exception in thread "jbal-strand-exec-0" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
ballerina: Oh no, something really went wrong. Bad. Sad.

We appreciate it if you can report the code that broke Ballerina in
https://github.com/ballerina-platform/ballerina-lang/issues with the
log you get below and your sample code.

We thank you for helping make us better.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

Output:

Requests      [total, rate, throughput]         219, 25.54, 0.68
Duration      [total, attack, wait]             33.747s, 8.576s, 25.171s
Latencies     [min, mean, 50, 90, 95, 99, max]  3.394s, 22.904s, 21.011s, 30.001s, 30.002s, 30.004s, 30.005s
Bytes In      [total, mean]                     7933, 36.22
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           10.50%
Status Codes  [code:count]                      0:93  200:23  500:103 

Can you check it?

daneshk commented 2 years ago

Reopening the issue. as we only fix a portion of the issue. only in cache module.

kalaiyarasiganeshalingam commented 2 years ago

@shafreenAnfar @ayeshLK The cache module had some leaks to get NPE. We have fixed it now by the above PR.

I did small changes to the HTTP module to avoid NPE to local testing. The cache works concurrently. Here, we are checking whether the cache has that key or not. If it has a key, then, it gets that value. So, sometimes that key may be removed from the cache in these gaps between two functions. Therefore, we can directly check the value that we get from the get() instead of this.

The following lines have to modify:

Both are using the getHeader(). So I tried to fix this issue by making the following changes in native code.

Couldn't fix that issue with the above changes.

So I used ballerina lock to the above both error points separately. This fixed that issue. So it seems that those functions have a concurrency issue.

lock {
        setAgeHeader(cachedResponse);
}

RequestCacheControl? reqCache = req.cacheControl;
ResponseCacheControl? resCache = cachedResponse.cacheControl;

lock {
     if (isFreshResponse(cachedResponse, isShared)) {
         // If the no-cache directive is not set, responses can be served straight from the cache, without
         // validating with the origin server.
         if (!isNoCacheSet(reqCache, resCache) && !req.hasHeader(PRAGMA)) {
             log:printDebug("Serving a cached fresh response without validating with the origin server");
              return cachedResponse;
          }

           log:printDebug("Serving a cached fresh response after validating with the origin server");
            return getValidationResponse(httpClient, req, cachedResponse, cache, currentT, path, httpMethod, true);
      }
}
kalaiyarasiganeshalingam commented 2 years ago

Closing the issue as we fixed in cache and http module.