-
I am trying to goose to read from .html files(specified url here for sake convenience in examples)[1]. But at times it's doesn't show any text. Please help me out here with the issue.
Goose version u…
-
What about provide optional extraction directives ?
In a majority of cases the extraction algorithm woks great. But for some web sites it can fail to extract relevant content. For these web sites it …
-
-
I tried `article.extractor.get_favicon(article.html)` but returned
**AttributeError: 'str' object has no attribute 'xpath'**
And `article.meta_favicon` that return ''
-
The Open Graph protocol states that if there are multiple tags, the first one should be preferred in cases of conflicts. See here: https://ogp.me/#array
But `article-extractor` prefers the last tag…
-
Hi, congrats for your work and great results!
From the article I knew that you used TSN as the feature extractor.We can download the TSN from the article you listed in your article, is that right?
-
I am getting the same error that Tommo565 was getting a couple of years ago.
I have executed both my own code and sample code. Both produce the following error:
Exception in thread Thread-1:
Tr…
-
I need to extract article bodies from raw htmls. My code is as simple as:
```
for html in htmls:
extractor = Extractor(extractor='ArticleExtractor', html=article)
extractor.getHTML()
```
Aft…
-
Hi,
I read [this page](https://github.com/polyrabbit/hacker-news-digest/blob/master/%5Btutorial%5D%20How-to-extract-main-content-from-web-pages-using-Machine-Learning.ipynb) from your doc the othe…
-
Is it possible to provide a default value for an extractor of a `Website Agent`?
I use the "Product Watch" scenario from the wiki. If an article is not available, the price extractor fails, which mak…