-
Want:
- [ ] rich metadata like I hacked into the theme: opengraph, schema.org AND microformats of the indieweb and maybe json-ld? So that horrible sites like linkedin and facebook and twitter but a…
-
which we would want to extract features as MUPET does, see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5939957/#SD1 and https://elifesciences.org/articles/67855/figures#content
Implementation here…
-
Hi,
I tried extracting the content for articles from http://www.clarin.com, but goose was unable to extract any content from any article under the clarin.com domain (like http://www.clarin.com/politic…
-
http://dl.acm.org/citation.cfm?id=2609559
-
Hi,
> !bai fasse mir diesen Artikel zusammen: https://members.outcomeedge.com/p/how-to-find-gold-mines
> Entschuldigung, ich kann auf externe Links nicht zugreifen. Wenn du möchtest, kannst du m…
-
Issue is to track efforts of parsing PDFs and any articles/documents relating to this.
Currently 'marker' is used https://github.com/VikParuchuri/marker
This requires a separate venv and I have do…
-
[論文URL](http://delivery.acm.org/10.1145/2610000/2609559/p787-wan.pdf?ip=131.112.138.2&id=2609559&acc=ACTIVE%20SERVICE&key=D2341B890AD12BFE%2EE857D5F645C75AE5%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID…
-
https://spacy.io/usage/rule-based-matching#matcher
An example for [Elmo article](https://muppet.fandom.com/wiki/Elmo): https://explosion.ai/demos/matcher?text=Elmo%20is%20a%20furry%20red%20Muppet%2…
-
First off I would like to thank the creators for making this package free as it is a lifesaver and a timesaver. However, I'd like to address the issues I'm having with the extractor and perhaps find a…
-
When filtering through the articles via a specific topics, e.g. "filter politics". The Index numbers used for commands such as "headlines" and "extract" are inconsistent. Some commands index are based…