Closed srijiths closed 6 years ago
Hi Flatpack is 'reasonable' in terms of speed but very very flexible and that obviously comes at a cost. I would be interested in knowing more about your tests.
Have you identified any bottleneck/run with a profiler?
Thanks Benoit
Thanks @benoitx . I did not try with a profiler. But i can say that i am running the test in a dedicated 16GB Ram quad core machine.
I am testing flatpack whether i will be able to use this in a low latency processing pipeline. But first results are unfortunately not promising. I agree that its very flexible in terms if its parsing capabilities.
Thanks, Sreejith
I have just created an test case with a 2.1gb csv file. The file is parsed in 46s 153ms. The file contains 22 columns seperated by a semicolon. The file contains about 9 million lines.
@benoitx we probably should close this.
@Test
public void testLargeCSV() {
InputStream inputStream = BulkTest.class.getClassLoader().getResourceAsStream("full.csv");
assertThat(inputStream, notNullValue());
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
Parser parser = BuffReaderParseFactory.getInstance().newDelimitedParser(reader, ';', '"');
parser.parseAsStream().stream().forEach(record -> {
assert (record.getColumns().length == 22);
assert (record.getString("openbareruimte") != null);
});
}
Thank you Martin.
Thank you Martin.
I've run a couple of session with YourKit and found out that the String.replace was taking a LONG time for a large CSV-type file. It took 70 sec on my old MBP.
Replacing the String.replace with the implementation from Apache Commons as per https://stackoverflow.com/questions/16228992/commons-lang-stringutils-replace-performance-vs-string-replace seems to have a 50% impact! We should try.
Hi,
I am using flatpack in a project and i see some performance degrade when i include flatpack. Without flatpack , say i am able to process 55,000 messages / sec. When i include flatpack in my pipeline , then throughput is downgraded to 8000 messages/sec.
I am using flatpack for a delimited file parsing.
Is this an expected behavior ? Or am i doing something wrong ?
Thanks,