-
i assume that sites from Bloomberg and Medium are too cool for beautiful soup, requiring escalation to Selenium
https://pypi.org/project/beautifulsoup4/
https://pypi.org/project/selenium/
-
```
import logging
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, TimeoutException, WebDriverException
from selenium.webdriver.common.by import By
f…
-
[scrapers.zip](https://github.com/OpenBudget/open-budget-frontend/files/237051/scrapers.zip)
-
See bachelor thesis https://nbn-resolving.org/urn:nbn:de:bsz:14-qucosa2-211701 to follow on.
-
I was working on a PR and I was surprised to see many linting errors.
I didn't see any documentation on which test I should run from my side (ruff, flak8 isort others).
**Describe the solution yo…
-
# Topics / Ideas - Revisied
## what can we cover in an hour?
## First Semester
*Using a computer - week 1 (but really 2) - tag team week 1*
- intro talk about goals of the program - Zoe
-…
-
The point is that the AI somehow gets the right key to extracting the necessary information.
You don't need the information itself, but this very key - xpath or css
the beatifulsoup command at leas…
-
Hi,
Thanks you guys for the great framework.
I am using scrapy to crawl multiple sites. Sites are diffrerent encodings.
One site is encoding as 'gbk' and it's declared in HTML meta. but scrapy …
-
Sometimes people paste from Word and high number entity codes (`𐟢`) are in the HTML. This causes a 500 error in the app. These characters should be replaced with spaces.
knice updated
7 years ago
-
**Bug Metadata**
* Version of extract_msg: 0.49.0
* Your python version: Python 3.10
* How did you launch extract_msg?
- [ x] My command line or
- [ x] I used the extract_msg package
**Des…