Closed ssimeonov closed 8 years ago
Thanks for the report, I will try to reproduce that on my end.
On Mon, Aug 24, 2015 at 8:41 AM, Simeon Simeonov notifications@github.com wrote:
Frequently, even when reading simple data such as a string field, the extension seg faults. The problem is often reproducible but not always. Consider the following psql session:
➜ ~ psql testdb psql (9.4.4) Type "help" for help.
-- Only 777 rows
select count(*) from advertiser_campaigns;
count
777 (1 row)
-- This works and shows all campaign names select name from advertiser_campaigns limit 1024;
-- This fails, even though it should do exactly the same type & amount of processing select name from advertiser_campaigns; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed.
Logs show a seg fault:
LOG: server process (PID 50531) was terminated by signal 11: Segmentation fault DETAIL: Failed process was running: select name from advertiser_campaigns; LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. LOG: all server processes terminated; reinitializing LOG: database system was interrupted; last known up at 2015-08-23 23:31:34 EDT LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 0/193E938 LOG: redo is not required LOG: MultiXact member wraparound protections are now enabled LOG: autovacuum launcher started LOG: database system is ready to accept connections
I've observed different behavior over time for variations of this simple query:
- Working for select * ... limit 1, failing for select * ... limit 2 but working for select * ... offset 1 limit 1.
- Working or failing on select * from advertiser_campaigns; between different server runs.
— Reply to this email directly or view it on GitHub https://github.com/EnterpriseDB/mongo_fdw/issues/34.
Ibrar Ahmed
Which version of MongoDB are you using? Are you using the Mongo C driver or Legacy driver to build the MongoDB FDW.
Sometime issues like that are dependent on the above factors?
On Mon, Aug 24, 2015 at 12:30 PM, Ibrar Ahmed notifications@github.com wrote:
Thanks for the report, I will try to reproduce that on my end.
On Mon, Aug 24, 2015 at 8:41 AM, Simeon Simeonov <notifications@github.com
wrote:
Frequently, even when reading simple data such as a string field, the extension seg faults. The problem is often reproducible but not always. Consider the following psql session:
➜ ~ psql testdb psql (9.4.4) Type "help" for help.
-- Only 777 rows
select count(*) from advertiser_campaigns;
count
777 (1 row)
-- This works and shows all campaign names select name from advertiser_campaigns limit 1024;
-- This fails, even though it should do exactly the same type & amount of processing select name from advertiser_campaigns; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed.
Logs show a seg fault:
LOG: server process (PID 50531) was terminated by signal 11: Segmentation fault DETAIL: Failed process was running: select name from advertiser_campaigns; LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. LOG: all server processes terminated; reinitializing LOG: database system was interrupted; last known up at 2015-08-23 23:31:34 EDT LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 0/193E938 LOG: redo is not required LOG: MultiXact member wraparound protections are now enabled LOG: autovacuum launcher started LOG: database system is ready to accept connections
I've observed different behavior over time for variations of this simple query:
- Working for select * ... limit 1, failing for select * ... limit 2 but working for select * ... offset 1 limit 1.
- Working or failing on select * from advertiser_campaigns; between different server runs.
— Reply to this email directly or view it on GitHub https://github.com/EnterpriseDB/mongo_fdw/issues/34.
Ibrar Ahmed
— Reply to this email directly or view it on GitHub https://github.com/EnterpriseDB/mongo_fdw/issues/34#issuecomment-134072101 .
Ahsan Hadi Snr Director Product Development EnterpriseDB Corporation The Enterprise Postgres Company
Phone: +92-51-8358874 Mobile: +92-333-5162114
Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb
This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
This particular test was against a local MongoDB 2.4. I built the extension using master.
On Mon, Aug 24, 2015 at 9:42 PM, Simeon Simeonov notifications@github.com wrote:
This particular test was against a local MongoDB 2.4. I built the extension using master.
I am using MongoDB 2.4.9 and PG 9.4 and not able to reproduce the crash. I inserted text data in a foreign table using mongo server and selected the text field from the foreign table and it seems to work fine. Can you send us your complete test case so we can reproduce the crash.
—
Reply to this email directly or view it on GitHub https://github.com/EnterpriseDB/mongo_fdw/issues/34#issuecomment-134294572 .
Ahsan Hadi Snr Director Product Development EnterpriseDB Corporation The Enterprise Postgres Company
Phone: +92-51-8358874 Mobile: +92-333-5162114
Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb
This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
Hi, Can you please send us the test case and send the stack trace when you get the crash in your environment. So we can move this issue to closure..
I'm also seeing this issue to with MongoDB 2.6.x and Postgres 9.4.5. I'm using mongo-c-driver 1.2.1.
(gdb) bt
#0 0x00007f4591d542f4 in _bson_data (bson=0x7ffe139dbe00) at src/bson/bson.c:236
#1 0x00007f4591d5905d in bson_get_data (bson=0x7ffe139dbe00) at src/bson/bson.c:2195
#2 0x00007f4591d5b62c in bson_iter_init (iter=0x7ffe139dba80, bson=0x7ffe139dbe00) at src/bson/bson-iter.c:52
#3 0x00007f45921cfe95 in BsonIterInit (it=it@entry=0x7ffe139dba80, b=b@entry=0x7ffe139dbe00) at mongo_wrapper_meta.c:208
#4 0x00007f45921d1ee9 in FillTupleSlot (bsonDocument=bsonDocument@entry=0x7ffe139dbe00, bsonDocumentKey=bsonDocumentKey@entry=0x7f45905a709f "data",
columnMappingHash=columnMappingHash@entry=0x2bd9de0, columnValues=columnValues@entry=0x2bd6aa0, columnNulls=columnNulls@entry=0x2bd6ad0)
at mongo_fdw.c:1140
#5 0x00007f45921d218b in FillTupleSlot (bsonDocument=<optimized out>, bsonDocumentKey=bsonDocumentKey@entry=0x0,
columnMappingHash=columnMappingHash@entry=0x2bd9de0, columnValues=columnValues@entry=0x2bd6aa0, columnNulls=columnNulls@entry=0x2bd6ad0)
at mongo_fdw.c:1184
#6 0x00007f45921d2756 in MongoIterateForeignScan (scanState=<optimized out>) at mongo_fdw.c:544
#7 0x00000000005ae061 in ?? ()
#8 0x0000000000598f36 in ExecScan ()
#9 0x0000000000592008 in ExecProcNode ()
#10 0x00000000005aaef9 in ExecSort ()
#11 0x0000000000591fb8 in ExecProcNode ()
#12 0x00000000005ae71e in ?? ()
#13 0x00000000005b0be3 in ExecWindowAgg ()
#14 0x0000000000591f88 in ExecProcNode ()
#15 0x0000000000598f36 in ExecScan ()
#16 0x0000000000592058 in ExecProcNode ()
#17 0x000000000058f5a7 in standard_ExecutorRun ()
#18 0x00000000005462e0 in ExecCreateTableAs ()
#19 0x0000000000680f78 in ?? ()
#20 0x00000000006801f6 in standard_ProcessUtility ()
#21 0x000000000067da44 in ?? ()
#22 0x000000000067e5e5 in ?? ()
#23 0x000000000067f06c in PortalRun ()
#24 0x000000000067c103 in PostgresMain ()
#25 0x0000000000461572 in ?? ()
#26 0x000000000062b490 in PostmasterMain ()
#27 0x000000000046267f in main ()
The problem is that BsonIterSubObject
is completely unimplemented for the mongo meta driver, so bson_iter_next ends up getting called with uninitialized memory. This is also the cause of #38.
The fix is completely trivial: implement BsonIterSubObject for the meta driver, ex.
bool
BsonIterSubObject(BSON_ITERATOR *it, BSON *b)
{
const uint8_t *buffer = NULL;
uint32_t len = 0;
bson_iter_document(it, &len, &buffer);
bson_init_static(b, buffer, len);
return true;
}
Frequently, even when reading simple data such as a string field, the extension seg faults. The problem is often reproducible but not always. Consider the following psql session:
Logs show a seg fault:
I've observed different behavior over time for variations of this simple query:
select * ... limit 1
, failing forselect * ... limit 2
but working forselect * ... offset 1 limit 1
.select * from advertiser_campaigns;
between different server runs.