Anonyfox / meteor-scrape

Scrape any Website or RSS/Atom-Feed with ease.
GNU Lesser General Public License v3.0
71 stars 19 forks source link

Returns empty object? #21

Open JulianKingman opened 9 years ago

JulianKingman commented 9 years ago

OK, so I'm probably doing something wrong here... I have a method that returns a scraped page, and am debugging it, this is what I have:

getContent: function(sourceLink){
    if(Meteor.isServer){
      console.log("getting source...", sourceLink);
      var webpage   = Scrape.website(sourceLink);
      console.log(webpage);
    }
  }

However, in the server console it's logging an empty object ( '{}' ) Why am I not getting the correct content? It makes no difference if I pass the link or put it in directly, it also doesn't matter what URL I use.

repjackson commented 9 years ago

Me too

JulianKingman commented 9 years ago

@repjackson I ended up using the cheerio library. I installed meteorhacks:npm, added the cheerio library, and used the following:

var $ = cheerio.load(Meteor.http.get(url).content, {});
var parsed = $('body *').not('style, script').map(function(idx, el) { return $(el).text(); }).get();

May be more than you need, but it worked for me. It returns an array of text in tags on the page. To get just the html, you can do $('body').html() (I think).

repjackson commented 9 years ago

Impressive. Thank you.