GateNLP / ultimate-sitemap-parser

Ultimate Website Sitemap Parser
https://mediacloud.org/
Other
181 stars 64 forks source link

Disable logging? #25

Open Pikamander2 opened 3 years ago

Pikamander2 commented 3 years ago

Is there a way to turn off logging? A single call to sitemap_tree_for_homepage spits out over 20 messages into the console, most of which are of no concern.

stefan-djurovic commented 3 years ago

Well, currently there is not.

sibalzer commented 3 years ago

You can set the logger level to warning to ignore unnecessary messages.

logging.getLogger("usp.fetch_parse").setLevel(logging.WARNING)
logging.getLogger("usp.helpers").setLevel(logging.WARNING)
logging.getLogger("usp.tree").setLevel(logging.WARNING)
ThibTrip commented 3 years ago
logging.getLogger("usp.fetch_parse").setLevel(logging.WARNING)
logging.getLogger("usp.helpers").setLevel(logging.WARNING)

Thank you for your answer but this does not seem to work 😿.

Example code

import logging
from usp.tree import sitemap_tree_for_homepage

logging.getLogger("usp.fetch_parse").setLevel(logging.WARNING)
logging.getLogger("usp.helpers").setLevel(logging.WARNING)
logging.getLogger("usp.tree").setLevel(logging.WARNING)

sitemap_tree_for_homepage('https://www.siemens.de/')
2021-05-21 11:29:27,202 WARNING usp.helpers [1407409/MainThread]: Request for URL https://www.siemens.de/robots.txt failed: 404 Not Found
2021-05-21 11:29:27,829 WARNING usp.helpers [1407409/MainThread]: Request for URL https://www.siemens.de/sitemap-index.xml.gz failed: 404 Not Found
2021-05-21 11:29:28,443 WARNING usp.helpers [1407409/MainThread]: Request for URL https://www.siemens.de/sitemap_index.xml failed: 404 Not Found
2021-05-21 11:29:29,067 WARNING usp.helpers [1407409/MainThread]: Request for URL https://www.siemens.de/.sitemap.xml failed: 404 Not Found
2021-05-21 11:29:29,694 WARNING usp.helpers [1407409/MainThread]: Request for URL https://www.siemens.de/sitemap/sitemap-index.xml failed: 404 Not Found
2021-05-21 11:29:30,303 WARNING usp.helpers [1407409/MainThread]: Request for URL https://www.siemens.de/sitemap.xml failed: 404 Not Found
...
sibalzer commented 3 years ago

Thats because you are getting warning messages. If you want to ignore them too you need to set the logging level higher (logging.ERROR or even logging.FATAL). But i wouldn't recommend it.

Pikamander2 commented 3 years ago

Ideally, it would be nice to have an official setting to suppress the output.