Appendium / flatpack

CSV/Tab Delimited and Fixed Length Parser and Writer
http://flatpack.sf.net
Apache License 2.0
57 stars 20 forks source link

Any performance benchmarks on flatpack ? #28

Closed srijiths closed 6 years ago

srijiths commented 7 years ago

Hi,

I am using flatpack in a project and i see some performance degrade when i include flatpack. Without flatpack , say i am able to process 55,000 messages / sec. When i include flatpack in my pipeline , then throughput is downgraded to 8000 messages/sec.

I am using flatpack for a delimited file parsing.

Is this an expected behavior ? Or am i doing something wrong ?

Thanks,

benoitx commented 7 years ago

Hi Flatpack is 'reasonable' in terms of speed but very very flexible and that obviously comes at a cost. I would be interested in knowing more about your tests.

Have you identified any bottleneck/run with a profiler?

Thanks Benoit

srijiths commented 7 years ago

Thanks @benoitx . I did not try with a profiler. But i can say that i am running the test in a dedicated 16GB Ram quad core machine.

I am testing flatpack whether i will be able to use this in a low latency processing pipeline. But first results are unfortunately not promising. I agree that its very flexible in terms if its parsing capabilities.

Thanks, Sreejith

martindiphoorn commented 6 years ago

I have just created an test case with a 2.1gb csv file. The file is parsed in 46s 153ms. The file contains 22 columns seperated by a semicolon. The file contains about 9 million lines.

@benoitx we probably should close this.

       @Test
    public void testLargeCSV() {
        InputStream inputStream = BulkTest.class.getClassLoader().getResourceAsStream("full.csv");
        assertThat(inputStream, notNullValue());

        BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
        Parser parser = BuffReaderParseFactory.getInstance().newDelimitedParser(reader, ';', '"');

        parser.parseAsStream().stream().forEach(record -> {
            assert (record.getColumns().length == 22);
            assert (record.getString("openbareruimte") != null);
        });
    }
benoitx commented 6 years ago

Thank you Martin.

benoitx commented 6 years ago

Thank you Martin.

benoitx commented 6 years ago

I've run a couple of session with YourKit and found out that the String.replace was taking a LONG time for a large CSV-type file. It took 70 sec on my old MBP.

Replacing the String.replace with the implementation from Apache Commons as per https://stackoverflow.com/questions/16228992/commons-lang-stringutils-replace-performance-vs-string-replace seems to have a 50% impact! We should try.