Tunous / SwipeNews-Issues

Feedback and bug reporting for SwipeNews
7 stars 0 forks source link

Improved full content parser #31

Open Tunous opened 8 years ago

Tunous commented 8 years ago

1) https://www.thequint.com/

Only half of the entire content is shown from almost all of the articles from this site. From what I have seen, there's an ad in the middle of the articles on the site and that is where content gets cut.

2) https://torrentfreak.com/

Here, the images in the articles are shown originally, but while loading the full content, the images aren't shown.

3) http://gadgets.ndtv.com/

No images anywhere.

4) http://xkcd.com/

No images anywhere :D

5) http://turnoff.us/

Same bugs as xkcd, but atleast here the thumbnails are of very high quality :D

kontrastylez commented 7 years ago

http://www.serienjunkies.de/

German Umlaute (ä, ö, ü) and sometimes quotation marks („ and ”) aren't correctly displayed. Here an example with Screenshot:

http://www.serienjunkies.de/news/teutonen-lehrer-erfolgreichem-staffelstart-80576.html

screenshot_2017-01-08-21-34-52

Additional problem with this site is, that there is free space, before article text begins. This seems to be the place, where the header image was extracted from.

http://www.serienjunkies.de/news/spoil-vampire-diaries-flash-arrow-80563.html

screenshot_2017-01-08-21-59-49

kontrastylez commented 7 years ago

Also a problem with german Umlaute.

http://winfuture.de/ http://winfuture.de/videos/Hardware/LG-Gram-2017-Ultraleichtes-Notebook-mit-grossem-Akku-im-Hands-On-17242.html screenshot_2017-01-08-21-53-02

http://www.playnation.de/ http://www.playnation.de/spiele-news/spielekultur/sind-spiele-kunst-ab-wann-id68728.html screenshot_2017-01-08-22-12-30

kontrastylez commented 7 years ago

http://www.moviepilot.de/

No article content is shown, only the text of the authors signature. Example:

http://www.moviepilot.de/news/golden-globe-2017-der-live-blog-zur-verleihung-183214 screenshot_2017-01-08-22-51-00

Tunous commented 7 years ago

Full content doesn't render special characters

Tunous commented 7 years ago

Sankakucomplex.com

Tunous commented 7 years ago

Previously my idea was to manually fix the currently used parser but I decided to try something else as that would require a lot of work. I've decided to try to use the Mercury Web Parser in version 0.17.4. From quick tests, it seems to work much better for most of the affected websites.

Sadly there still seems to be an issue with German characters. I've sent an email to the support team about this issue. Hopefully, they'll be able to fix it on their side or help me with this if that's my error.

If this parser API won't work correctly I'll return to my idea of fixing the previous parser by myself. For now please test it and report any new issues once the new version is released.

Tunous commented 7 years ago

Got report for a website which is not loaded correctly using Mercury parser: http://dp.do/80623

I think I'll look at testing other feed parsers and adding an option to switch between them if they'll be better.