Open mcoenca opened 9 years ago
title: yes, the current behaviour tries to find the best "title" from the page, be it the <title>
tag, parts of it, or the first headline <h1>
, and so on. I agree that this is somewhat confusing sometimes, we'll have a look at this.
description: this is a bug, thanks for reporting! I can reproduce it, shouldn't be too hard to fix.
In particular, when you scrape a page with a title with a dash "-" or '|" in it, the title is cut to only the first part.
Example: Scrape.website(https://www.youtube.com/watch?v=TvyWRevLG5I) It displays as title 'Ethereal Dreams' instead of 'Ethereal Dreams' - Chill Mix
Scrape.website(https://www.youtube.com/watch?v=RgLDHIUl4PA_ Only returns as title 13.Best of Chill Out instead of 13. Best of Chill Out | Ambient | New Age | Lounge... [HD]
Also, Description returns 'true' sometimes, without any meta description tag present.
Example Scrape.website(https://meteorhacks.com/introduction-to-latency-compensation.html) ... lang: 'en', I20150429-17:06:27.303(2)? description: 'true', I20150429-17:06:27.303(2)? favicon: 'https://meteorhacks.com/', I20150429-17:06:27.303(2)? references: I20150429-17:06:27.303(2)? [ 'https://bulletproofmeteor.com/?utm_source=meteorhacks&utm_medium=link&utm_term=meteorhacks&utm_content=homepage&utm_campaign=meteorhacks', I20150429-17:06:27.304(2)? 'https://kadira.io/?utm_source=meteorhacks&utm_medium=banner&utm_term=kadira&utm_content=toplink&utm_campaign=kadira', I20150429-17:06:27.304(2)? 'http://www.meteor.com/', ...
This seems not coherent, it should return 'undefined' or 'notFound'...
If i have some time i will try to submit a pull request, but they should not be too hard to fix :)
Thanks anyway