-
CF https://issues.apache.org/jira/browse/JAMES-4061
- [ ] Handle `blockquote` into JsoupHtmlTextExtractor
- [ ] Handle `a` into JsoupHtmlTextExtractor
-
**Describe the problem you are trying to solve**
Exctract data from an html page. Lots of older sites with valuabke data dont have an api. Extracting html with a regex is possible but very inconveni…
-
We should have a simple extractor that pulls the HTML and extracted body text of a document.
-
Need to add the HTML extractor feature for creating separate records for each `p`, `li`, `td` and code tag.
Could be customized through the `nodes_to_index` option.
-
- [x] I'm reporting a broken site support issue
- [x] I've verified that I'm running youtube-dl version **2020.07.28**
- [x] I've checked that all provided URLs are alive and playable in a browser
…
-
Invalid: [nhentai] https://nhentai.net/g/464415/
version: 4.1 (24-02-28 04:49:54 UTC)
platform / locale: Windows-10-10.0.22621-SP0 / en_us
order / group / uid: 0 / False / 34bf2ae21e6c42a59504531…
-
-
Hi!
If we'll try to process such html with `angular-gettext-tools`:
``` html
```
It will produce duplicate keys:
- `"BlahBlahBlah"`
- `"{{'BlahBlahBlah"`
As you see, in html attribute we have si…
-
### DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
- [X] I understand that I will be **blocked** if I *intentionally* remove or skip any mandatory\* field
### Checklist
- [X] I'm reporting that yt-…
-
I have translated all 21 files into javascript with npm libs like
```
"axios": "^0.21.1",
"chardet": "^1.3.0",
"cheerio": "^1.0.0-rc.10",
"commander": "^8.0.0",
"html-esca…