Closed glitsj16 closed 1 year ago
UPDATE
Did some digging and I think this is new behaviour in Beautiful Soup. When I drop the requirement from the current beautifulsoup4==4.11.2
to the former beautifulsoup4==4.10.0
, there is no such warning.
Looking at the changelog, there's mention of:
* Issue a warning when an HTML parser is used to parse a document that
looks like XML but not XHTML. [bug=1939121]
Here's the relevant bug report.
I'm not sure what to make of the warning, but it seems pretty harmless. I could live with it, but I'm running monitoring shell scripts on my private whoogle-search instances and this keeps triggering alerts. For now I've added a small patch to results.py to silence this particular class of warnings:
--- a/app/utils/results.py
+++ b/app/utils/results.py
@@ -8,6 +8,10 @@
import urllib.parse as urlparse
from urllib.parse import parse_qs
import re
+from bs4 import BeautifulSoup, MarkupResemblesLocatorWarning
+import warnings
+
+warnings.filterwarnings("ignore", category=MarkupResemblesLocatorWarning)
SKIP_ARGS = ['ref_src', 'utm']
SKIP_PREFIX = ['//www.', '//mobile.', '//m.', 'www.', 'mobile.', 'm.']
Hope this helps...
Thanks for the info! I just implemented the solution you described for now, since the warnings are indeed pretty harmless.
(FYI reported upstream at https://bugs.launchpad.net/beautifulsoup/+bug/2052988)
Describe the bug After running a search I see the below on the command line:
To Reproduce Steps to reproduce the behavior:
Deployment Method
run
executableVersion of Whoogle Search
Desktop (please complete the following information):
Additional context $ pacman -Q python python 3.10.9-1
$ python3 -um app --debug
``` * Serving Flask app 'app' * Debug mode: on WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on http://127.0.0.1:5000 Press CTRL+C to quit * Restarting with stat * Debugger is active! * Debugger PIN: 117-182-872 /home/glitsj16/whoogle-search/app/utils/results.py:99: MarkupResemblesLocatorWarning: The input looks more like a filename than markup. You may want to open this file and pass the filehandle into Beautiful Soup. element.replace_with(BeautifulSoup( 127.0.0.1 - - [06/Mar/2023 17:47:52] "POST /search HTTP/1.1" 200 - 127.0.0.1 - - [06/Mar/2023 17:47:53] "GET /static/build/search.3e5a8ad9.css HTTP/1.1" 304 - 127.0.0.1 - - [06/Mar/2023 17:47:53] "GET /static/build/logo.72c3bd56.css HTTP/1.1" 304 - 127.0.0.1 - - [06/Mar/2023 17:47:53] "GET /static/build/input.61ccbb50.css HTTP/1.1" 304 - 127.0.0.1 - - [06/Mar/2023 17:47:53] "GET /static/build/dark-theme.b0749774.css HTTP/1.1" 304 - 127.0.0.1 - - [06/Mar/2023 17:47:53] "GET /static/build/header.978026e5.css HTTP/1.1" 304 - 127.0.0.1 - - [06/Mar/2023 17:47:53] "GET /static/build/header.a12e0a24.js HTTP/1.1" 304 - 127.0.0.1 - - [06/Mar/2023 17:47:53] "GET /static/build/autocomplete.1661f315.js HTTP/1.1" 304 - 127.0.0.1 - - [06/Mar/2023 17:47:53] "GET /static/build/utils.b8afbbaa.js HTTP/1.1" 304 - 127.0.0.1 - - [06/Mar/2023 17:47:53] "GET /static/build/keyboard.890853c5.js HTTP/1.1" 304 - 127.0.0.1 - - [06/Mar/2023 17:47:53] "GET /static/build/currency.3dde589d.js HTTP/1.1" 304 - 127.0.0.1 - - [06/Mar/2023 17:47:53] "GET /static/img/favicon.ico HTTP/1.1" 304 - ^C ```