gabceb / node-metainspector

Node npm for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, an array with all the links, all the images in it, etc. Inspired by the metainspector Ruby gem
MIT License
129 stars 52 forks source link

Client not removed completely and interferes with future requests #8

Open fox-alvarez opened 10 years ago

fox-alvarez commented 10 years ago

Good day

I am having a problem

The plugin works fine when the url is entered from another accessible page hits http, url But when he is sent to a page that is not actually there first enters:

client.once("error", function(err) {
        console.log("##  ERR  ##", err);
        //socket.emit("datosWebScraping", err);
        //delete client;
    });

and returning the data:

Object {code: "ENOTFOUND", errno: "ENOTFOUND", syscall: "getaddrinfo"} 

But when I want to re-scrape the page if it exists, it is as if the previous error has affected this new client (or former client has not been deleted) and this causes the following error

TypeError: Object #<EventEmitter> has no method 'title'
    at EventEmitter.<anonymous> (/root/nodejs/proyect/WebScraping.js:15:37)

I was noticing as trying to erase all traces of the client created on each request, but nothing worked for me.

The code in my js file on the server is as follows:

var MetaInspector = require('node-metainspector');
function EscanearWeb(socket, datos_entrada) {

    var client = new MetaInspector(datos_entrada.url, {});
    client.once("fetch", function() {
        var datos_salida = {};
        datos_salida.title = client.title();
        datos_salida.description = client.description();
        datos_salida.image = client.image();
        socket.emit("datosWebScraping", datos_salida);
    });

    client.once("error", function(err) {
        console.log("##  ERR  ##", err);
        socket.emit("datosWebScraping", err);
        //delete client;
    });

    client.fetch();
}

I need that every request will create a new client and can be removed without a trace. I hope you can help me

Greetings

MiroRadenovic commented 10 years ago

hi, i'm not sure i have understand well, but i have fixed this error in my repo: https://github.com/MiroRadenovic/node-metainspector

the commit that clears all eventhandlers is: https://github.com/MiroRadenovic/node-metainspector/commit/a253d7ac60009d3e73d421a14450373acd07e2b6

try to use my repo and let me know if this fixes the problem

gabceb commented 10 years ago

Hi!

Sorry i've been slow responsing to these issues. @MiroRadenovic feel free to submit a PR with the fix. @wiii feel free to submit a PR with a failing test and I will make sure the issue is fixed and the test passes.

Thanks!

MiroRadenovic commented 10 years ago

@gabceb there is already a pending PR: https://github.com/gabceb/node-metainspector/pull/7

please note that i have also added a option object to the constructor.

i have never used Travis, which in this case shows an error.. Please let me know if you need further assistance on this PL

draschke commented 9 years ago

@MiroRadenovic Thanks a lot!
This small piece of code saved my day! "this.removeAllListeners();"