Closed richardtreier closed 1 year ago
Now a second connector ran into the same problem. I expect that more users will face the issue in the future.
this is likely caused by the Statement
fetchSize
set at 5000:
https://github.com/eclipse-edc/Connector/blob/905ef20363fcb65b989299ef85e80f4951212ca3/extensions/common/sql/sql-core/src/main/java/org/eclipse/edc/sql/SqlQueryExecutor.java#L94
My proposal could be to make it configurable, maybe raise the default value (but not too much as this should be something that should be tuned properly) and improve the error message.
Would you like to contribute to it?
I'll work on this
I'm not able to reproduce it, trying from an integration test added in the PostgresTransferProcessStoreTest
:
@Test
void query_6000_items() {
range(0, 6000).forEach(i -> getTransferProcessStore().updateOrCreate(createTransferProcess("test-neg-" + i)));
var query = QuerySpec.Builder.newInstance().limit(6000).build();
var result = getTransferProcessStore().findAll(query);
assertThat(result).hasSize(6000);
}
This works. I started postgres with this command:
docker run --rm --name edc-postgres -e POSTGRES_PASSWORD=password -p 5432:5432 -d postgres
I scrolled through the JDBC/PostgreSQL documentation about the fetchSize
that was my first guess but in fact that's a way to eventually "tune" the query, not to put a limit on it.
So my suspects go to your postgres instance, does it have specific configuration? What script you used to create the schema? Unfortunately the internet does not speak a lot about this specific issue, but my suggestion is to try to run a connector with a fresh postgres instance and see if the issue appears again.
Nice to see some progress and tests!
Yes, the FetchSize query hint is supposed to be largely ignored by PostgreSQL and should thus not be set or set to -1 (https://www.phind.com/search?cache=6e83988d-b28e-4974-8a93-201367336839&init=true).
Another lead I had was that most of my search results pointed to problems with either Connection
-reuse and setAutoCommit
, as the integration test code mocks the creation of the connection in PostgresqlStoreSetupExtension#beforeEach
.
So there's three possibilities as I see:
Connection
-instances, transaction management, autoCommit, etc.)I would prefer looking at the official documentation than some AI guessing:
Changing the code to use cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour).
We need to use the cursor because this was the point of using streams, otherwise the catalog creation could affect the connector performances significantly (if not crash it). Same speech for setAutoCommit(false)
.
Having the fetchSize
value configurable seems fair to me, setting a reasonable default value (as 5000) to keep the current (and tested) behavior working as expected.
Bug Report
Describe the Bug
When querying transfer processes, and there's about 6k rows in
edc_transfer_process
, using the management API,?limit=4999
works, but?limit=5000
fails.With PostgreSQL errors such as
ERROR: portal "C_3448" does not exist
.Expected Behavior
QuerySpec.max()
should work.?limit=20000
should work.Observed Behavior
Database Errors
Steps to Reproduce
edc_transfer_process
, which has been reached in some productive connectors./transferprocess?limit=4999
should work/transferprocess?limit=5000
should not workContext Information
MS8 PostgreSQL 11
Detailed Description
An excerpt from our grafana logs: