EnterpriseDB / mongo_fdw

PostgreSQL foreign data wrapper for MongoDB
GNU Lesser General Public License v3.0
330 stars 70 forks source link

No WHERE pushdown for ObjectID (NAME) based queries. #169

Open Climax777 opened 1 year ago

Climax777 commented 1 year ago

Closed issue #163 implies we cannot get WHERE clause pushdown on ObjectID (NAME) types queries.

For example, in the provided example in https://www.enterprisedb.com/docs/mongo_data_adapter/latest/08c_example_where_pushdown/ querying for a specific one causes a collection scan (this is an objectID generated in the insert): select * from emp where _id = '654ca5772cff3e18f66ce383'

Resulting explain analyze:

"QUERY PLAN"
"Foreign Scan on public.emp (actual time=4.406..4.416 rows=1 loops=1)"
"  Output: eid"
"  Filter: (emp._id = '654ca5772cff3e18f66ce383'::name)"
"  Rows Removed by Filter: 3"
"  Foreign Namespace: edb.emp"
"Planning Time: 0.103 ms"
"Execution Time: 5.383 ms"

Note 3 rows were returned from the foreign end, which means the OID query hasn't been pushed down. This, to me, is a big problem since there are many common scenarios where this is required to be filtered at the foreign end.

sheim-dev commented 1 year ago

This is in fact a big issue in several of my common use cases as well.

Climax777 commented 1 year ago

This is in fact a big issue in several of my common use cases as well.

I'm working on it, though I have a lot of learning to do. I'm not familiar with the PostgreSQL C-APIs. But it seems doable.

In my poking around I saw that some columns do in fact filter and others not, even though they are both NAME typed. And also _id never works

vaibhavdalvi93 commented 10 months ago

@Climax777 , thank you for working on PR #170. I am NOT in favour of pull request 170 because this looks unsafe to me. The postgres_fdw and other fdw don’t allow this. The postgres_fdw don’t allow because of following reason:

Note: The C and POSIX locales may behave differently depending on the database encoding.

On all platforms, the collations named default, C, and POSIX are available. Additional collations may be available depending on operating system support. The default collation selects the LC_COLLATE and LC_CTYPE values specified at database creation time. The C and POSIX collations both specify “traditional C” behavior, in which only the ASCII letters “A” through “Z” are treated as letters, and sorting is done strictly by character code byte values.

For more details please refer this.

For homogeneous database, if there is risk then for heterogeneous looks more unsafe to me. Also, we don’t consider LOCALE at the time of forming mongo query pipeline.

If we do proposed change then that will be applicable to all columns not only _id one. So, to achieve one thing, we can't take risk of non-reliable results.

If we want to push-down non-default collation then need to form the query pipeline with collation method/operator. Also, need to maintain the collation mapping table for Postgres and MongoDB, along with checking it’s availability. This is going to difficult from maintainability and there is risk of potential bugs.

Following documents can be referred to learn about collation in MongoDB.

Collation

db.collection.find()

If you're agree with the explanation, could you close this issue and PR request too.

Thanks, Vaibhav

Climax777 commented 10 months ago

I understand your concerns, but surely collation should not affect or be affected by objectIds? Should this then only be an exception case for this type perhaps?

On 04 Jan 2024, at 13:47, Vaibhav Dalvi @.***> wrote:

@Climax777 https://github.com/Climax777 , thank you for working on PR #170 https://github.com/EnterpriseDB/mongo_fdw/pull/170. I am NOT in favour of pull request 170 https://github.com/EnterpriseDB/mongo_fdw/pull/170 because this looks unsafe to me. The postgres_fdw and other fdw don’t allow this. The postgres_fdw don’t allow because of following reason:

Note: The C and POSIX locales may behave differently depending on the database encoding.

On all platforms, the collations named default, C, and POSIX are available. Additional collations may be available depending on operating system support. The default collation selects the LC_COLLATE and LC_CTYPE values specified at database creation time. The C and POSIX collations both specify “traditional C” behavior, in which only the ASCII letters “A” through “Z” are treated as letters, and sorting is done strictly by character code byte values.

For more details please refer this https://www.postgresql.org/docs/current/collation.html#COLLATION-MANAGING-STANDARD.

For homogeneous database, if there is risk then for heterogeneous looks more unsafe to me. Also, we don’t consider LOCALE at the time of forming mongo query pipeline.

If we do proposed change then that will be applicable to all columns not only _id one. So, to achieve one thing, we can't take risk of non-reliable results.

If we want to push-down non-default collation then need to form the query pipeline with collation method/operator. Also, need to maintain the collation mapping table for Postgres and MongoDB, along with checking it’s availability. This is going to difficult from maintainability and there is risk of potential bugs.

Following documents can be referred to learn about collation in MongoDB.

Collation https://www.mongodb.com/docs/manual/reference/collation/ db.collection.find() https://www.mongodb.com/docs/manual/reference/method/db.collection.find/#specify-collation If you're agree with the explanation, could you close this issue and PR request too.

Thanks, Vaibhav

— Reply to this email directly, view it on GitHub https://github.com/EnterpriseDB/mongo_fdw/issues/169#issuecomment-1876964830, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF3XHLZNCWKCU7ZZRP2HD3YM2JD5AVCNFSM6AAAAAA7FWF4MWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZWHE3DIOBTGA. You are receiving this because you were mentioned.

vaibhavdalvi93 commented 10 months ago

@Climax777 , I can understand your concern.

surely collation should not affect or be affected by objectIds? Should this then only be an exception case for this type perhaps?

The _id field can be of any BSON data type i.e. not necessary the _id field data type is always of ObjectId type. For your reference following is the line from MongoDB office documentation: The _id field may contain values of any BSON data type, other than an array, regex, or undefined.

For more details on _id field please refer this document.

So, we can't make collation exception for _id field because it's NOT guarantee that it's always a ObjectId type. We could do so if _id field is of type ObjectId only.

Hope, it's clear to you. If not, feel free to revert back.

Thanks, Vaibhav