foodcoops / foodsoft

Web-based software to manage a non-profit food coop (product catalog, ordering, accounting, job scheduling).
https://foodcoops.net/
Other
318 stars 146 forks source link

Sporadic Message Errors #1055

Open mortbauer opened 4 months ago

mortbauer commented 4 months ago

Since I upgraded our foodsoft to 4.8.1 users are reporting problems sending messages, I could also reproduce it once, after restarting the server the problem was gone in my case. I was also able to isolate the logs for one such problem (I removed the message body because I think it is unrelated):

fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.121226 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847] Started POST "/ruebezahl17/messages" for 213.142.96.93 at 2024-03-03 14:15:17 +0000
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.124339 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847] Processing by MessagesController#create as HTML
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.124664 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847]   Parameters: {"authenticity_token"=>"[FILTERED]", "message"=>{"reply_to"=>"", "send_method"=>"recipients", "workgroup_id"=>"45", "ordergroup_id"=>"264", "messagegroup_id"=>"170", "order_id"=>"2231", "recipient_tokens"=>"[FILTERED]", "private"=>"0", "subject"=>"Dein Kontoblatt", "body"=>""}, "commit"=>"Nachricht verschicken", "foodcoop"=>"ruebezahl17"}
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.206397 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847] Set locale to de
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.279499 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847] Completed 500 Internal Server Error in 155ms (ActiveRecord: 103.2ms | Allocations: 8272)
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.296609 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847]   Rendered /usr/local/bundle/gems/exception_notification-4.5.0/lib/exception_notifier/views/exception_notifier/exception_notification.text.erb (Duration: 11.0ms | Allocations: 3335)
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.665082 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847] Delivered mail 65e485f54927e_b822a60732c7@822efed581f2.mail (366.5ms)
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | F, [2024-03-03T14:15:17.668100 #184] FATAL -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847]   
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847] ActiveRecord::StatementInvalid (Mysql2::Error: Unknown column 'recipients_ids' in 'field list'):
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847]   
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847] app/controllers/application_controller.rb:97:in `set_currency'
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847] app/controllers/application_controller.rb:85:in `set_time_zone'
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.669792 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847] Processing by ErrorsController#show as HTML
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.670146 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847]   Parameters: {"authenticity_token"=>"[FILTERED]", "message"=>{"reply_to"=>"", "send_method"=>"recipients", "workgroup_id"=>"45", "ordergroup_id"=>"264", "messagegroup_id"=>"170", "order_id"=>"2231", "recipient_tokens"=>"[FILTERED]", "private"=>"0", "subject"=>"Dein Kontoblatt", "body"=>"}, "commit"=>"Nachricht verschicken", "foodcoop"=>"ruebezahl17"}
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.680794 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847] Set locale to de
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.703175 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847]   Rendered errors/internal_server_error.html.haml within layouts/application (Duration: 2.3ms | Allocations: 341)
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.842519 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847]   Rendered layout layouts/application.html.haml (Duration: 141.7ms | Allocations: 23687)
fcat_foodsoft.1.qfrln2a3tbb6@v22018056292066830    | I, [2024-03-03T14:15:17.843133 #184]  INFO -- : [54dd8b7e-6d8e-48d7-a5ee-6db73cdb8847] Completed 500 Internal Server Error in 173ms (Views: 79.6ms | ActiveRecord: 70.7ms | Allocations: 27075)

Another pattern seems to be that it only occurs for answers to messages or creating messages to ordergoups.

I do not really get it for now since recipient_ids obviously shouldn't hit the db, however it seems it does.

mortbauer commented 4 months ago

Ok, gave it some more thinking, can it be that the message model is missing an: attribute :recipients_ids ?

mortbauer commented 4 months ago

I wrapped now the messages create controller in debug log level and and just look at the following lines which are the sql insert statements created for a short time window, first it produces the correct sql and suddenly it produces 4 times in a row the wrong:

image

It includes 2 additional fields, recipients_ids (which should actually by a method provided by ApplicationRecord) and email_state which is actually a field from MessageRecipient.

mixed_sql.log

mortbauer commented 4 months ago

This are the logs from more or less the same message once it worked and the other time it didn't. failed_transaction_debug.log working_same_transaction.log

mortbauer commented 4 months ago

I'm really stuck here, actually this seems to be really old code and suddenly it creates that strange behavior. And it is happening rather often so it is starting to get really troublesome.

lentschi commented 4 months ago

Hm... to be honest, for now, I have no clue :worried: I also cannot reproduce this locally - maybe it's some issue only occurring with certain production environment settings.

@yksflip Have you ever experienced this error on your servers?

It seems like some kind of rails association caching bug to me, so maybe upgrading rails' (or some other component's) patch version might help. I couldn't find a rails issue for this on GitHub though.

What you could try, is add this temporary line at the top of MessagesController#create:

Rails.logger.info "---TMP DUMMY: #{Message.first.recipients.first.id}"

In the best case, this could even somehow force rails to reload the missing association config before trying to save the message. In the worst case it would throw a similar error three lines earlier.

yksflip commented 4 months ago

this sounds really tricky. I haven't heard of this issue before.

I cannot make any sense why it tries to write recipients_ids into the message table. I mean it might used to be like that before the MessageRecipient was introduced, but that's long ago.

@lentschi I don't quite understand why this might be a association caching bug, could you maybe explain this? The association is called message_recipients but here it tries to save to a column called recipients_ids ...

lentschi commented 4 months ago

@yksflip

The association is called message_recipients but here it tries to save to a column called recipients_ids ...

Yes, but there's the recipients association too - see https://github.com/foodcoops/foodsoft/blob/790a2637aa937aee9e6c44d3991d7a16228fc4bc/plugins/messages/app/models/message.rb#L8

So my only guess for now is that active records' associations get confused somehow. (Maybe depending on the order of importing some plugin or whatever... :shrug: )

I'm afraid, I don't have too much time to investigate right now. Should the bug still be there come Friday, I can try to debug.

lentschi commented 4 months ago

Debugged a little more but still couldn't find a lead. :worried:

@mortbauer When adding the suggested debug log, you could also add Rails.logger.info "---TMP DUMMY message columns: #{Message.column_names}". That should result in ["id", "sender_id", "subject", "private", "created_at", "reply_to", "group_id", "salt", "received_email"], but I have an inkling, that in the error case it will also include recipients_ids.

What we could also add a debug log for is Message#create_message_recipients - maybe the before_validation handler isn't run for some reason. (But actually I commented it out locally and still don't see any errors related to db queries - the save just fails with a validation error then (see validates_presence_of :message_recipients).

mortbauer commented 4 months ago

Thank you for looking into it, but I think I got rid of the problem with these two commits, 1fa917291452c03cc05e8788140650d701f1a308,92a59018b4da23cc369c41da35cc318d7feff715. The foodsoft is now running with these since almost one week and the error did not show up anymore. So I'm now cleaning it from the debug prints and bring it our main branch.