EnterpriseDB / mongo_fdw

PostgreSQL foreign data wrapper for MongoDB
GNU Lesser General Public License v3.0
326 stars 70 forks source link

Searching by object id #163

Closed MihaiRaduSandu closed 2 years ago

MihaiRaduSandu commented 2 years ago

Hello,

When I try to search by a specific _id in the foreign mongo table, it takes a very long time.

SELECT * FROM request_body r WHERE r._id = '99f....ac'

It does not seem to use the default index.

If I set the _id column as bytea, and do the same search but with \x in front, then it works instantly.

SELECT * FROM request_body r WHERE r._id = '\x99f....ac'

If I set the _id to some other data type different than NAME and BYTEA then the column is returned NULL or an error is returned.

Is there a reason for this behavior? I tried converting the string to hex and all sorts of combinations.

What I was actually aiming for was to search a range based on the timestamp in the object id, but I'm assuming it would be considered a nested operation, even if I use between?

I managed to search by timestamp in the object id if it is set as NAME, but again takes a very long time. In bytea format the comparison operator doesn't seem to work the same way and doesn't return results.

Any help would be highly appreciated.

The funny thing is there are no examples anywhere regarding the search by _id, only by warehouse_id :)

Thank you, Mihai

vaibhavdalvi93 commented 2 years ago

Thanks, @MihaiRaduSandu for reporting an issue.

Currently, _id can only be defined as NAME.

The slowness is because WHERE condition is NOT getting pushed down to remote server due to collation restriction.

We may work on supporting various types for _id column in the near future, but currently it is not prioritised. However, feel free to propose a patch for supporting various types for the _id column. We will be happy to review that and take it further.

For time being, to serve your purpose, you can add another separate column for timestamp data and use that in where clause.

Thank you, Vaibhav

MihaiRaduSandu commented 2 years ago

Thank you @vaibhavdalvi93 for your answer. The thing is I have so many questions and there is so little documentation on various situations or corner cases. Is there a place where I could ask these questions and receive an answer from the community? Also being new to this, how do I propose a patch?

I would further like to ask you: Are indexes used in any way on mongo or there is no pushdown on the WHERE clause? Or is it only if I mention one element in the WHERE clause?

I already have a column created but it was not indexed so it was still doing a full scan. Now I added the index and the query seems to finish in half the time, but I am not sure if it actually uses the index as on explain I don't see any mention of using indexes and I don't have admin access to the mongo server to view the exact query. I am actually having 3 conditions in WHERE.

QUERY PLAN Hash Left Join (cost=7455.68..92006.95 rows=513 width=426) (actual time=55.886..14393.001 rows=30208 loops=1) Hash Cond: (((r.body -> 0) ->> 'ID_furnizor'::text) = ((s.id)::character varying)::text) -> Foreign Scan on request_body r (cost=5.00..84476.71 rows=14 width=100) (actual time=25.742..9720.423 rows=30208 loops=1) Foreign Namespace: supplier-order-node.request_body -> Hash (cost=7359.00..7359.00 rows=7334 width=322) (actual time=30.010..30.011 rows=7290 loops=1) Buckets: 8192 Batches: 1 Memory Usage: 582kB -> Foreign Scan on suppliers s (cost=25.00..7359.00 rows=7334 width=322) (actual time=12.384..27.786 rows=7290 loops=1) Remote server startup cost: 25 Planning Time: 4.560 ms Execution Time: 14398.497 ms

Thank you, Mihai

vaibhavdalvi93 commented 2 years ago

Is there a place where I could ask these questions and receive an answer from the community?

You can list all your questions in single issue or multiple issues as per your convenience. We'll be happy to answer those questions.

Also being new to this, how do I propose a patch

You can add pull request. You can refer contribution guidelines here https://github.com/EnterpriseDB/mongo_fdw/blob/master/CONTRIBUTING.md

Are indexes used in any way on mongo or there is no pushdown on the WHERE clause? Or is it only if I mention one element in the WHERE clause?

We are supporting WHERE clause pushdown with some limitations. The pushdown of the WHERE clause is possible only when clauses include the operator expressions that have a column and a constant as arguments. The recursive expression is not supported for WHERE clause pushdown. You can have multiple elements in WHERE clause as shown below:

SELECT * FROM t1 WHERE age = 25 AND name = 'abc';

Where, age and name are the columns.

Thank you, Vaibhav

MihaiRaduSandu commented 2 years ago

UPDATE: I figured it out, if I remove the left join it will do a pushdown on all 4. Weird :)

Thank you again @vaibhavdalvi93 ,

One more question if I may, I figured out what was going on.

I have this query: SELECT some fields FROM mongo.table r LEFT JOIN mysql.table ON CAST(s.id AS VARCHAR) = r.body->0->>'field' WHERE r.created >= EXTRACT(EPOCH FROM '2022-07-01'::TIMESTAMP) AND r.created < EXTRACT(EPOCH FROM '2022-07-05'::TIMESTAMP) AND r.document_type = 'invoice' AND r.message_type = 'request';

So, I was trying to add 2 conditions on created, but when I tried to add the second one the query would take a very long time. And I thought it was because somehow 2 conditions on the same field would trigger a full scan. But after playing a bit, I realized that the problem only happens if I add 4 conditions instead of 3 conditions (in any combination). So it would seem that you do not pushdown the conditions if there are more than 3? Or is it another issue? I tried using IN () instead of =, put 2 conditions into parentheses, I don't seem to be able to convince it to do the pushdown on all 4.

If you also have any suggestions related to the syntax please don't mind to mention them.

Thank you, Mihai

vaibhavdalvi93 commented 2 years ago

Hi @MihaiRaduSandu,

So it would seem that you do not pushdown the conditions if there are more than 3? Or is it another issue?

There is no such restriction on number of conditions in WHERE clause. I think, the second condition which you're trying to add is not getting pushdown which is causing slowness. To analyse this further, can you please provide us EXPLAIN ANALYZE output of the query which has 4 conditions and causing slowness issue? To get EXPLAIN ANALYZE plan, you just need to add EXPLAIN ANALYZE at the starting of the query(E.g. EXPLAIN ANALYZE SELECT * FROM t1). This plan may help us to understand which condition is not getting pushdown. Also, it would really helpful if you provide us foreign table definition to get to know the column TYPE details.

Thank you, Vaibhav

MihaiRaduSandu commented 2 years ago

Hello, Thank you. As I mentioned in the previous comment, I managed to identify the issue, it was related to the fact that I was also doing a left join which when I was using 3 conditions in where was working ok, but when I was using 4 conditions it was changing the execution plan so I removed the left join and did it after extracting the data from mongo. I'm new to PostgreSQL, so not yet familiar in how to fully understand the explain but getting there. If I have other issues or questions I will open another post.

I wish you a great day, Mihai

Climax777 commented 10 months ago

Does this mean/imply that we cannot get WHERE clause pushdown on ObjectID (NAME) types queries? For example in the provided example in https://www.enterprisedb.com/docs/mongo_data_adapter/latest/08c_example_where_pushdown/ querying for a specific one causes a collection scan (this is an objectID generated in the insert): select * from emp where _id = '654ca5772cff3e18f66ce383'

Resulting explain analyze:

"QUERY PLAN"
"Foreign Scan on public.emp (actual time=4.406..4.416 rows=1 loops=1)"
"  Output: eid"
"  Filter: (emp._id = '654ca5772cff3e18f66ce383'::name)"
"  Rows Removed by Filter: 3"
"  Foreign Namespace: edb.emp"
"Planning Time: 0.103 ms"
"Execution Time: 5.383 ms"

Note 3 rows were returned from the foreign end, which means the OID query hasn't been pushed down. This to me seems like a big problem since there are many common scenarios where this will be required.

Climax777 commented 10 months ago

Should I rather open a new issue?