Open firebird-automations opened 14 years ago
Commented by: Neil Pickles (npickles)
The attached crashdump & Dr Watson log files illustrate the issue.
Commented by: @AlexPeshkoff
Neil, can you try latest snapshot of 2.1 branch?
Commented by: Neil Pickles (npickles)
Attached are the Dr Watson logs & Firebird logs from another site that has fallen over during the weekend.
This site was using the latest build of FB 2.1.4, v2.1.4.18314 but is still having a problem.
I'll post details of where the crashdump file can be downloaded from once i have it back from site as it'll be around 100 Meg and too large to upload onto the tracker directly.
Commented by: Neil Pickles (npickles)
Here is another Dr Watson log file (leeds drwtsn32.log) from another site. This time there is nothing in the Firebird log to speak of, there is this log file and a crashdump file. I'll get the crashdump file back from site and post details of where it can be downloaded from as it'll be around 100 Meg zipped up.
Again, this site was using the latest v2.1.4 build I could find last week, 18314.
Commented by: Neil Pickles (npickles)
The two crashdumps for the two sites can be downloaded from http://news.csy.co.uk/leedscrashdump3.7z & http://news.csy.co.uk/bentoncrashdump.7z
Any help would be very much appreciated as to what's going on here.
Commented by: @hvlad
About leedscrashdump3.
The call stack is :
> fbserver.exe!looper(Jrd::thread_db * tdbb=0x044ef988, Jrd::jrd_req * request=0x0538f6f8, Jrd::jrd_nod * in_node=0x00d39d34) Line 1863 C++ fbserver.exe!execute_looper(Jrd::thread_db * tdbb=0x00000000, Jrd::jrd_req * request=0x00000000, Jrd::jrd_tra * transaction=0x04f393b8, Jrd::jrd_req::req_s next_state=req_proceed) Line 1461 + 0x1f bytes C++ fbserver.exe!EXE_send(Jrd::thread_db * tdbb=0x00000004, Jrd::jrd_req * request=0x0538f6f8, unsigned short msg=32696, unsigned short length=32, const unsigned char * buffer=0x0534d760) Line 1003 + 0xf bytes C++ fbserver.exe!jrd8_start_and_send(int * user_status=0x044efd8c, Jrd::jrd_req * * req_handle=0x00acab64, Jrd::jrd_tra * * tra_handle=0x04851280, unsigned short msg_type=0, unsigned short msg_length=32, char * msg=0x0534d760, short level=0) Line 3790 + 0x19 bytes C++ fbserver.exe!isc_start_and_send(int * user_status=0x044efd8c, void * * req_handle=0x0534df7c, void * * tra_handle=0x0534df68, unsigned short msg_type=0, unsigned short msg_length=32, const char * msg=0x0534d760, short level=0) Line 4942 + 0x2f bytes C++ fbserver.exe!execute_request(dsql_req * request=0x0534df24, void * * trans_handle=0x044efde4, unsigned short in_blr_length=16, const unsigned char * in_blr=0x00acaac8, unsigned short in_msg_length=26, unsigned char * in_msg=0x00ac8588, unsigned short out_blr_length=0, unsigned char * out_blr=0x00000000, unsigned short out_msg_length=0, unsigned char * out_msg=0x00000000, bool singleton=false) Line 3429 + 0x26 bytes C++ fbserver.exe!GDS_DSQL_EXECUTE_CPP(int * user_status=0x044efd00, void * * trans_handle=0x044efde4, dsql_req * * req_handle=0x00acaa84, unsigned short in_blr_length=16, const unsigned char * in_blr=0x00acaac8, unsigned short in_msg_type=0, unsigned short in_msg_length=26, unsigned char * in_msg=0x00ac8588, unsigned short out_blr_length=0, unsigned char * out_blr=0x00000000, unsigned short out_msg_type=64804, unsigned short out_msg_length=0, unsigned char * out_msg=0x00000000) Line 570 + 0x26 bytes C++ fbserver.exe!dsql8_execute(int * user_status=0x044efd8c, void * * trans_handle=0x044efde4, dsql_req * * req_handle=0x00acaa84, unsigned short in_blr_length=16, const char * in_blr=0x00acaac8, unsigned short in_msg_type=0, unsigned short in_msg_length=26, char * in_msg=0x00ac8588, unsigned short out_blr_length=0, char * out_blr=0x00000000, unsigned short out_msg_type=0, unsigned short out_msg_length=0, char * out_msg=0x00000000) Line 296 + 0x41 bytes C++ fbserver.exe!isc_dsql_execute2_m(int * user_status=0x00000000, void * * tra_handle=0x044efde4, void * * stmt_handle=0x00a42c4c, unsigned short in_blr_length=16, const char * in_blr=0x00acaac8, unsigned short in_msg_type=0, unsigned short in_msg_length=26, char * in_msg=0x00ac8588, unsigned short out_blr_length=0, char * out_blr=0x00000000, unsigned short out_msg_type=0, unsigned short out_msg_length=0, char * out_msg=0x00000000) Line 2531 + 0x36 bytes C++ fbserver.exe!rem_port::execute_statement(P_OP op=op_execute, p_sqldata * sqldata=0x00003bdb, packet * sendL=0x00acaf10) Line 2172 C++ fbserver.exe!process_packet2(rem_port * port=0x01704abc, packet * sendL=0x00acaf10, packet * receive=0x00acb1c4, rem_port * * result=0x044eff44) Line 3622 C++ fbserver.exe!process_packet(rem_port * port=0x01704abc, packet * sendL=0x00acaf10, packet * receive=0x00acb1c4, rem_port * * result=0x044eff44) Line 3372 + 0x22 bytes C++
and AV is at exe.cpp, line 1863 :
static jrd_nod* looper(thread_db* tdbb, jrd_req* request, jrd_nod* in_node) { ...
int node\_type = node\-\>nod\_type;
switch \(node\-\>nod\_type\) \{ <\-\-\- HERE
case nod\_asn\_list:
if \(request\-\>req\_operation == jrd\_req::req\_evaluate\) \{
local variable "node" contains garbage bytes.
tdbb, database, attachment and request - all seems OK and valid.
The SQL text of request is
UPDATE VETRANS SET CREATESYNCID=? WHERE TRANSID=?
So far i have no ideas what happens :(
Commented by: @hvlad
About bentoncrushdump
The call stack is :
> msvcr80.dll!7814537a()
[Frames below may be incorrect and/or missing, no symbols loaded for msvcr80.dll]
fbserver.exe!Jrd::LocksCache<Jrd::CachedLock>::get(Jrd::thread_db * tdbb=0x07f9f988, const unsigned char * key=0x07f9b46c) Line 180 C++
fbserver.exe!Jrd::BtrPageGCLock::disablePageGC(Jrd::thread_db * tdbb=0x07f9f988, const Jrd::PageNumber & page={...}) Line 262 + 0xd bytes C++
fbserver.exe!add_node(Jrd::thread_db * tdbb=0x00000000, Jrd::win * window=0x07f9b5f4, Jrd::index_insertion * insertion=0x07f9e73c, Jrd::temporary_key * new_key=0x07f9b690, RecordNumber * new_record_number=0x07f9b610, long * original_page=0x07f9b53c, long * sibling_page=0x07f9b55c) Line 2385 C++
fbserver.exe!add_node(Jrd::thread_db * tdbb=0x04d4317c, Jrd::win * window=0x07f9b5f4, Jrd::index_insertion * insertion=0x07f9e73c, Jrd::temporary_key * new_key=0x07f9b690, RecordNumber * new_record_number=0x07f9b610, long * original_page=0x00000000, long * sibling_page=0x00000000) Line 2393 + 0x35 bytes C++
fbserver.exe!BTR_insert(Jrd::thread_db * tdbb=0x07f9f988, Jrd::win * root_window=0x07f9e724, Jrd::index_insertion * insertion=0x07f9e73c) Line 1031 + 0x29 bytes C++
fbserver.exe!insert_key(Jrd::thread_db * tdbb=0x07f9f988, Jrd::jrd_rel * relation=0x0370e544, Jrd::Record * record=0x04fcd4e4, Jrd::jrd_tra * transaction=0x00000000, Jrd::win * window_ptr=0x00000000, Jrd::index_insertion * insertion=0x07f9e73c, Jrd::jrd_rel * * bad_relation=0x07f9f860, unsigned short * bad_index=0x07f9f86c) Line 1603 + 0x4a bytes C++
fbserver.exe!IDX_store(Jrd::thread_db * tdbb=0x07f9f988, Jrd::record_param * rpb=0x04e81508, Jrd::jrd_tra * transaction=0x078e55b4, Jrd::jrd_rel * * bad_relation=0x07f9f860, unsigned short * bad_index=0x07f9f86c) Line 998 + 0x22 bytes C++
AV is at memmove, which was called from Array::remove(size_t index) :
template <class LockClass> GlobalRWLock* LocksCache<LockClass>::get(thread_db *tdbb, const UCHAR* key) { ... que_inst = que_inst->que_backward; QUE_DELETE(lock->m_lru); m_sortedLocks.remove(pos); <--- HERE
if \(lock\-\>setLockKey\(tdbb, key\)\)
break;
It seems that que_inst at line 171is wrong and points to the heal of que (this->m_lru) :
lock = \(LockClass\*\) \(\(SCHAR\*\) que\_inst \- OFFSET \(LockClass\*, m\_lru\)\);
therefore "lock" is also invalid and its key can't be found at m_sortedLocks. So, "pos" have wrong value and memmove crashed.
Unfortunately due to inlined code and high usage of registers by MSVC optimizer i can't verify this guess even with crush dump with full process memory.
To fix this i offer following patch :
RCS file: /cvsroot/firebird/firebird2/src/jrd/Attic/LocksCache.h,v retrieving revision 1.1.2.3 diff -u -w -b -r1.1.2.3 LocksCache.h --- jrd/LocksCache.h 27 Oct 2009 09:16:27 -0000 1.1.2.3 +++ jrd/LocksCache.h 21 Jun 2010 21:59:57 -0000 @@ -174,6 +174,9 @@ fb_assert(found);
que\_inst = que\_inst\-\>que\_backward;
+ if (que_inst == &m_lru) { + que_inst = que_inst->que_backward; + } QUE_DELETE(lock->m_lru); m_sortedLocks.remove(pos);
Commented by: Neil Pickles (npickles)
I have another instance of an access violation occuring within Firebird, the associated crashdump & dr watson log files can be downloaded from http://news.csy.co.uk/christchurch.7z.
Again this is using the same v2.1.4 build as the other sites.
Commented by: @hvlad
cristchurch is the same case as above with LocksCache
Commented by: @hvlad
Neil, try please
http://www.firebirdsql.org/download/rabbits/hvlad/fbserver-2.1.4.18314-1_Win32.7z
this is fbserver.exe, based on current 2.1.4 branch with patch above.
It should fix issues with LocksCache, i hope
Commented by: Neil Pickles (npickles)
I'll try that at the 6 or so sites that are running v2.1.4.
I also have another instance, not sure if it is the same as before, that can be downloaded from http://news.csy.co.uk/brackmills.7z
Commented by: @hvlad
"brackmills" is also same issue with LocksCache. FB 2.1.3 this time
Commented by: Neil Pickles (npickles)
I updated the 10 test sites to the patched version of v2.1.4 overnight.
Two have since produced crashdump files again.
These can be downloaded from http://news.csy.co.uk/stirling_v214_patch.7z & http://news.csy.co.uk/brackmills_v214_patch.7z
Commented by: Neil Pickles (npickles)
I've since got another 3 crashdumps from the 10 test sites, I'll let you know where they can be downloaded from when I have them back from site. Two are from the same sites as the previous 2 and 1 is from a different site.
Commented by: Neil Pickles (npickles)
The three new crashdumps can be downloaded from http://news.csy.co.uk/stirling_v214_patched_2.7z , http://news.csy.co.uk/bedminster_v214_patched.7z & http://news.csy.co.uk/brackmills_v214_patched_2.7z
Commented by: @hvlad
stirling_v214_patch is the same issue with LocksCache. Other dumps will look shortly.
I prepared new build with new patch :
http://www.firebirdsql.org/download/rabbits/hvlad/fbserver-2.1.4.18314-2_Win32.7z
Commented by: Neil Pickles (npickles)
I'll get that new version installed and see how I get on with that. I'll try to do it today.
There is another crashdump available for download that is from the first patched version. http://news.csy.co.uk/benton_v214_patched.7z , if you could confirm that they are experiencing the same issue as the others with LocksCache or not.
Commented by: Neil Pickles (npickles)
Updated the 10 test sites last night and so far today, it's 3.30pm in the UK now, none have fallen over.
I'll monitor things over the weekend and update you on Monday but it's looking good so far.
Thanks for all your help.
Any idea when v2.1.4 is going to be released, is there much still to be fixed into v2.1.4 that you know about ?
Cheers,
Neil Pickles
Commented by: Neil Pickles (npickles)
I've been monitoring this since last week and it now appears to be sorted out.
Thanks for your prompt help with this, I look forward to the official release of v2.1.4, soon hopefully.
Cheers,
Commented by: @hvlad
Neil, thanks for reports and patience.
I'll commit fix for "LocksCache" issue into your CORE3050 as this ticket (CORE3053) points to the another issue, related to the process shutdown and sooner of all fixed by Alex in CORE2865. I'll change description at CORE3050 to better reflex nature of bug.
Submitted by: Neil Pickles (npickles)
Attachments: leedscrashdump.7z benton crash logs.7z leeds drwtsn32.log
Firebird just stops responding and requires a restart to get it going again. Initially thought to relate to CORE2900 but subsequently told it is a seperate issue.