fivefilters / ftr-site-config

Site-specific article extraction rules to aid content extractors, feed readers, and 'read later' applications.
https://www.fivefilters.org/full-text-rss/
Other
367 stars 254 forks source link

outlined words get censored on psychologytoday.com #251

Closed anarcat closed 7 years ago

anarcat commented 7 years ago

This article about Richard Stallman is really weird to read on parsed content, because all the "dotted underline" words are just removed from the output. I reproduced this on the test site.

The last sentence of the second paragraph should read like:

If you’ve heard of open source (free software’s practice sans its moral stance) or Linux (really GNU, plus a program called Linux), you can thank Stallman.

(Emphasis on the removed word added.)

Instead it is:

If you’ve heard of open source (free software’s practice sans its stance) or Linux (really GNU, plus a program called Linux), you can thank Stallman.

Thanks!

fivefilters commented 7 years ago

Hi, that's strange. The external parser on that site does preserve it. Not sure what the difference is between that and the internal one. I can't see anything strange in the site config files for psychologytoday.com to be stripping those elements. And they're preserved correctly in Full-Text RSS: http://ftr.fivefilters.org/makefulltextfeed.php?url=https%3A%2F%2Fwww.psychologytoday.com%2Farticles%2F201611%2Fthe-sorcerers-code&max=3

Maybe @j0k3r or someone else can help?

anarcat commented 7 years ago

i always get the strange stuff. :p

j0k3r commented 7 years ago

It's not a problem related to siteconfig it's related to php-readability.

@anarcat you should better post this kind of problem on wallabag first and we'll see, after investigation, if we need to change sth in siteconfig to fix the problem :slightly_smiling_face:

This issue was moved to j0k3r/php-readability#23 and can be closed.

anarcat commented 7 years ago

okay well, i thought that since i reproduced the problem on the external website, the next step was to file an issue here. i was following this guide - maybe i misunderstood something or the guide can be clarified?

j0k3r commented 7 years ago

Yeah maybe we should clarify few things

anarcat commented 7 years ago

wait so this is fixed right?