HenryQW / mercury_fulltext

📖 Enjoy full text for tt-rss.
159 stars 27 forks source link

Plugin not re-writing article contents with returned information from API #14

Closed TheFiZi closed 5 years ago

TheFiZi commented 5 years ago

Plugin appears to be working up until re-writes the article content with the information returned from the mercury_api.

I can see the plugin passing the URL to the mercury_api but then the body of the articles is not being replaced with the returned results from the API.

CentOS 6 PHP 7.1.26 MariaDB 15.1

TTRSS 19.2 (900cdbb) Node 10.15.1 (for mercury-parser-api)

Your mercury API

None I can see. I can enable PHP debugging if that would be helpful and try to find something.

Subscribe to a feed, enable the mercury_fulltext plugin, wait for feed to refresh with new articles.

I am testing with: https://www.wired.com/feed/category/business/latest/rss

HenryQW commented 5 years ago

This is a bit weird, my ttrss is able to process fulltext for some articles, for exmaple:

https://www.wired.com/story/huawei-case-signals-new-us-china-cold-war-tech/

image

https://www.wired.com/story/tim-wu-says-us-must-enforce-antitrust-laws/

image

However, some articles aren't processed, such as:

https://www.wired.com/story/huawei-sues-us-prodding-prove-suspicions/

and

https://www.wired.com/story/facebook-zuckerberg-privacy-pivot/

So I did a manual curl to my mercury api

curl mercury:3000/parser?url=https://www.wired.com/story/facebook-zuckerberg-privacy-pivot/

It returned errors:

{"error":true,"messages":"The url parameter passed does not look like a valid URL. Please check your data and try again."}

But when I spinned up a local mercury api, it's able to return the content.

Please try a manual curl and let me know if you are experiencing the same problem.

HenryQW commented 5 years ago

I think I've found the issue, wired has a paywall which will prevent you from getting the article after you've read some.

image

This is when the local mercury api returned the very same error.

image

https://github.com/postlight/mercury-parser/blob/0940971069290b5bce4dc9422d8b4d0d20f7d3b5/dist/mercury.js#L362-L363

Unfortunately, there is nothing I can do about this.

TheFiZi commented 5 years ago

Of course! I completely forgot they had the limited amount of articles. When I was using the official API this was likely not a problem because it probably used a lot of public IPs.

Normally I wouldn't have this issue but because I'd removed/re-added a feed for testing it burned through all of my IPs views for 24 hours.

Thanks again for the assist.

Full disclosure: I have a sub to Wired.com but they don't offer full text RSS like Arstechnica does.