Closed aashish0074 closed 3 years ago
It is difficult to figure out with so many possible causes. A trace of the insert would be a nice start to figure out the issue.
Hi @yzchang
. Inserting records without auto-commit mode (tried inserting all the records at once & in the batch of 5000 as well) . Ubuntu . Version MonetDB Database Server Toolkit v11.39.17 (Oct2020-SP5) . yes it is same machine where I'm running my code with monetDB
@PedroTadim How we can exactly trace the insertion? I have read some documents and found that bulk insertion is much faster through CSV files. My use-case is we need to insert approx 40Million records in the monetdb on the hourly basis
++ Our use-case is we need to insert approx 40Million records in multiple tables on hourly basis
TRACE
to get execution trace@yzchang . Already tried the trace command for a single row, I'm inserting 74500 records dynamically through my code so I can't track the same. . Timing cached reads: 35310 MB in 2.00 seconds = 17686.75 MB/sec Timing buffered disk reads: 2950 MB in 3.00 seconds = 982.86 MB/sec
. there are no concurrent queries
. for higher data, is the CSV approach is recommended?
. for higher data, is the CSV approach is recommended?
absolutely
@yzchang with JDBC approach this slowness is the known thing or am I only facing it? I have tried to insert the same data in my PostgreSQL and it took only 0.9 seconds
how long does it take if you do the same insert using mclient
? => then we'll know if it's a JDBC problem or INSERT INTO problem.
PS> if you don't like to use CSV loading, you can also consider binary copy into. That's even (much) faster than CSV COPY INTO, but it requires more work from the user side.
how wide is the table your inserting into (ie how many attributes does it have)?
@njnes Only 3 columns
74500 in 15 minutes is way too slow. Although we improved a lot on the too be release jul branch even the older oct release should do 75k rows in way less time. Could you try with mclient instead of jdbc to see the bare performance. What are the types of these 3 columns (just strings and numbers)?
@njnes One is bigInt and the other 2 are text columns.
For the performance benchmark, how can I run insertion for 74500 records with mclient ? Is there any way to do it As I'm inserting all the records dynamically through JDBC
mclient wants a single (large) file (with start transaction/commit). One easy way to create this is to use msqldump -N -t tablename dbname. This will output the table including create table and start transaction/commit. So just removing the create table, would give the correct test file.
Then test with clean database and recreated empty table.
@njnes with the large file (CSV) we have already tested with mclient, for 74500 records through CSV file taking less than 1 second but with the jdbc we are inserting all the same records in batch and taking 15 minutes
In order to analyze what is going on in your Java program we need to have the source code which details how you do the inserts using JDBC methods. Can you please send us the relevant source code (including connection, statements, etc) for analysis? Can you also tell me if you turned off the autocommit mode and do explicit commit control e.g. after all rows have been inserted?
Also I need to know which version of the MonetDB JDBC driver you are using. The latest version (monetdb-jdbc-3.1.jre8.jar) is available from: https://www.monetdb.org/downloads/Java/
You may also look at https://www.monetdb.org/Documentation/ServerAdministration/LoadingBulkData It explains the different possiblities to insert data for MonetDB.
@mvdvm Yes, we are using the latest version of the MonetDB JDBC driver.
Configuration and source code as follows:
<bean id="dataSourceMonet" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
<property name="url" value="jdbc:monetdb://localhost:54321/aashishTest?so_timeout=600000" />
<property name="username" value="monetdb" />
<property name="password" value="monetdb" />
</bean>
conn = (MonetConnection) jdbcTemplate.getDataSource().getConnection();
conn.setAutoCommit(false);
PreparedStatement ps = connection.prepareStatement(sql);
int batchCount = 0;
for(int i = 0; i<data.size();i++){
Map<String,Object> dataMap = data.get(i);
int param = 1;
for (String colName : columnAndDataTypes.keySet()) {
String originalColName = colName;
if ("functions".equalsIgnoreCase(colName)) {
colName = "Function";
}
ps.setObject(param,dataMap.get(colName),ColumnDataTypes.getDataTypeCode(columnAndDataTypes.get(originalColName)));
param++;
}
ps.addBatch();
++batchCount;
if(batchCount % batchSize == 0 || batchCount == data.size()) {
ps.executeBatch();
ps.clearBatch();
}
}
ps.close();
con.commit();
jdbcTemplate.batchUpdate(sql, data, batchSize,
new ParameterizedPreparedStatementSetter<Map<String, Object>>() {
@Override
public void setValues(PreparedStatement ps, Map<String, Object> argument)
throws SQLException {
int j = 1;
for (String colName : columnAndDataTypes.keySet()) {
if ("functions".equalsIgnoreCase(colName)) {
colName = "Function";
}
ps.setObject(j, argument.get(colName));
j++;
}
}
});
Hi,
We have analysed why it is taking so much time and found there is a problem in the Oct2020 release server when used with prepared insert statements, as is the case in your use case.
Luckily we also have found a workaround for Oct2020 releases. Please execute the SQL statement: SET sys.optimizer = 'minimal_pipe' before doing the insertions. So in Java: Statement stmt = conn.createStatement(); stmt.execute("SET sys.optimizer = 'minimal_pipe'");
After all the insertions are done you can set the optimizer setting back to its original value via: stmt.execute("SET sys.optimizer = 'default_pipe'"); See: https://www.monetdb.org/Documentation/SQLreference/SQLSyntaxOverview#SET_OPTIMIZER
This optimizer setting should improve the insert performance, especially for large nr of inserted rows. Once tested, let us know if this issue can be closed.
FYI: The issue has already been corrected in the upcoming new Jul2021 release (which is not yet released). The new Jul2021 release will also have significant improved insert performance, especially when sys.optimizer = 'minimal_pipe' is set. So add it to your program also for future MonetDB releases.
Tip: If your table has table constraints (like primary key and/or unique key and/or foreign key) it helps to remove those constraints first before you insert large amounts of data into the table and add the constraints again to the table after the inserts are completed. Use SQL: ALTER TABLE ... DROP CONSTRAINT ... and ALTER TABLE ... ADD CONSTRAINT ... PRIMAY KEY (col1) for doing this. See: https://www.monetdb.org/Documentation/SQLreference/SQLSyntaxOverview#ALTER_TABLE_ADD_CONSTRAINT
PS: when using monetdb-jdbc-3.1.jre8.jar (or monetdb-jdbc-3.0.jre8.jar) you no longer need to call: ps.clearBatch(); after each ps.executeBatch(); This issue has been corrected already in monetdb-jdbc-3.0.jre8.jar and newer versions.
@mvdvm thanks for the update. we will try this work-around. Can you please let us know when this new Jul2021 release is coming
Hi,
I'm inserting 74500 records through batch (JDBC driver) and it is taking around 15 minutes. Why the insertion is too slow? We don't want to insert data through csv