Netflix / astyanax

Cassandra Java Client
Apache License 2.0
1.04k stars 354 forks source link

Issue when reading StringSerialiser when using them in callback for AllRowsReader #499

Open akjequinix opened 10 years ago

akjequinix commented 10 years ago

I am calling AllRowsReader recepie like below.

boolean status = new AllRowsReader.Builder<K, C>(keyspace, colFam).withPageSize(100).withConcurrencyLevel(6).forEachRow(callback) .build().call();

Following is my call back function .

static class Log4jRowFetchFunction implements Function<Row<String, String>, Boolean>{

    private JSONArray result;

    public Log4jRowFetchFunction(JSONArray result) {

        this.result = result;
    }

    @Override
    public Boolean apply(Row<String, String> row) {

         ColumnList<String> columns = row.getColumns();

         JSONObject json= new JSONObject();

        if(null!=columns && !columns.isEmpty()){

            for (Column<String> c : columns) {
                String name = c.getName();
                Object value ;

                if ("creation".equalsIgnoreCase(name)) {
                    value = c.getDateValue();
                } else {
                    value = c.getStringValue();
                }
                try {
                    json.put(name, value);
                } catch (JSONException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }
        }

        result.put(json);

         return true;
    }   

I am getting OOM when getting String Value ( c.getStringValue()) in some time.

softwarebandit commented 10 years ago

Please provide more details about your environment and how to reproduce the problem so that it is easier for people to help you.

How large is the data set (how many rows are you reading)? What kind of data is inside the string? What values are you storing? How large is each string? What size is your JVM heap/memory set for?

My guess is that you are reading everything into the result which takes up all the memory and when you read the next string value, there is no more memory available.

akjequinix commented 10 years ago

Hi , May environment is DSE 3.0 Dev cluster with 6 nodes of each 8 GB RAM. And I am trying to fetch this data in an application deployed in tomcat server. I have around 8 Lakh rows each having around 10 columns . Its basically log4j raw data. One of the field message may have large string values. I had set -Xms1024m -Xmx2048m for heap size.

Please let me know your inputs.

softwarebandit commented 10 years ago

You can trying increasing the maximum heap size (-Xmx) and see if that resolves your problem.

Your problem does not seem to be caused by an issue in the Astyanax library. So I recommend that you look at generic java out of memory exceptions on stackoverflow. Take a look at the following post: http://stackoverflow.com/questions/52353/in-java-what-is-the-best-way-to-determine-the-size-of-an-object

In general, you shouldn't keep large objects like your JSONArray result in memory (unless you have a good reason for it) but rather write out the data to a file as you read it.