Closed rossj-cargotel closed 9 years ago
Hi,
not sure what you're after, do you mean a SELECT with a specific WHERE condition is still slow? If yes, could you please show the EXPLAIN output?
Please note that if you want to query all rows of a large table all the time, it might be better to have something materialized locally (e.g. a MATERIALIZED VIEW or similar). That's not how a foreign table works.
Hi,
Yes, the speed of getting the result set is what I mean. Here's an explain analyze from the alt_info table above:
cargotel=# explain analyze verbose select * from metro_ft.alt_info where load_id > 1;
WARNING: opened informix connection with warnings
DETAIL: informix SQLSTATE 01I01: "Database has transactions "
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan on metro_ft.alt_info (cost=2924.00..3553.33 rows=62933 width=680) (actual time=26.142..231429.761 rows=92084 loops=1)
Output: id, load_id, field_id, value_obs, field_link, value, join_at
Informix costs: 2924.00
Informix query: SELECT *, rowid FROM alt_info WHERE load_id > 1
Planning time: 78.501 ms
Execution time: 231483.801 ms
(6 rows)
Time: 231615.592 ms
There is an index on the load_id column on the informix side. I'm setting up iwatch on informix in hopes it can show me more what's happening. For sure that query in informix doesn't take 4 minutes.
I'd looked at and discarded using materialized views because of the cost of updating them but it might be possible to put triggers on the informix side and save changes in another table that could be unioned with the materialized views. Seems a little wonky though.
Thank you, Bernd, for your help and hard work on this FDW!
Okay, the predicate gets pushed down, however, there are still many rows to transmit according to the EXPLAIN. It's pretty likely that the index isn't used on the Informix side.
Currently, the informix FDW fetches a row for each call to ifxIterateForeignScan(), so one row at a time the PostgreSQL Executor performs the foreign scan. So if you have large results sets, network latency and all additional overhead applies to this single FETCH for each row.
I've thought about using fetch arrays in the past, but disregarded them since they seem to have a benefit with large objects only, see
for details. It's also possible to do prefetching on our own, but i'm not sure how this works together with DML commands. This would require much more thinking, i believe.
I've tested the performance with a remote Informix server running on a VM in the past, and i have to say that i get much better results:
bernd@localhost:bernd #= EXPLAIN ANALYZE SELECT * FROM osm_roads WHERE id > 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Foreign Scan on osm_roads (cost=2925.00..3481.13 rows=55613 width=644) (actual time=1.273..2683.446 rows=55613 loops=1)
Informix costs: 2925.00
Informix query: SELECT *, rowid FROM osm_roads WHERE id > 1
Planning time: 1.910 ms
Execution time: 2689.411 ms
The table is smaller than yours, however, it looks like you have to deal with heavy roundtrips somewhere. So something that would save you transmitting the large datasets all the time would be certainly an option in your case, like the materialized view mentioned earlier. This way, you'd have to spent only during the refresh the time it needs to transfer all your data...
We are moving to a server in the same hosting environment as the informix database so hopefully that will eliminate the roundtrip delay.
Closing this for now.
I think I've worked through all of the casting/blob issues but I'm seeing really slow result return times.
For example, to query a foreign table with 91,873 rows takes over 200,000 ms, doesn't seem to matter what I query for or whether there's an index on the informix side or not.
Here's the informix version of the alt_info table:
The postgres fdw version of this is:
I set log_min_messages to debug5 and ran this:
The debug5 logs seem to show an iteration over each row:
and so on, repeating.
I'm doing selects only at this point but need to do some fairly complex queries that I will map as views. I haven't tried disable_rowid because from the description it doesn't apply to selects. I also thought it might be related to the table option versus the query option but there doesn't seem to be a substantial difference between the two now that I have the casts/blobs worked out.
Any suggestions on how to speed this up?
Thanks, Bernd!