JuanmaMenendez / website-change-monitor

Monitor a website and get email and Slack notifications when specific changes are detected
MIT License
144 stars 37 forks source link

Checking for Multiple URL's #5

Open leela-krishna-kumar opened 4 years ago

leela-krishna-kumar commented 4 years ago

Sir, Can you please help me in monitoring multiple url's. This project is very great and working fine for one url. Please help me.

JuanmaMenendez commented 4 years ago

That should be easy to do:

1- Instead of a single variable, create an array of URLs to check

const urlsToCheck = ["http://urlyouwant.com/tocheck1", "http://urlyouwant.com/tocheck2", "http://urlyouwant.com/tocheck3];

2- You need to create a for-loop that wraps the request(...) function call, in order to call that function with all the urlsToCheck, one by one, eg:

urlsToCheck.forEach((urlToCheck)={

//this funtion already exist
request(urlToCheck, function (err, response, body) {
.....
}

});

Note: If better performance is needed I recommend to use the library https://www.npmjs.com/package/jest-worker in order to make the request(...) calls in parallel

leela-krishna-kumar commented 4 years ago

Sir, I am able to parse multiple urls now. Thank you for the help. But while detecting the page with keywords.. It is checking the whole web page body to detect the keyword and giving the result. Can we check the keyword only in changed content and return a message if the keyword detected in changed content ?

Example :

New Content :+1:

India is a land of ganga. India is also known as Bharat.

Old Content :+1:

India is also known as Bharat.

For keyword = Ganga.. It should give me a slack message. For Keyword = Bharat.. it should not give me a slack message.. when it is updated. It should only compare keyword with changed content.

Please help me. Thank you.

JuanmaMenendez commented 4 years ago

Hi @leela-krishna-kumar I am glad to hear you made some progress.

Regarding your requirement of just check in the new content sadly that is a very particular case that is out of the scope of this project.

Anyway, I think in that case you can save the body content between setIntervals(), and apply a diff using something like this library https://www.npmjs.com/package/diff to get the difference (new content) between the old body and the new body. And then just apply the if (elementsToSearchFor.some((el) =>NEWCONTENT.includes(el))) to the new conent.

leela-krishna-kumar commented 4 years ago

Sir, I have been trying whole day to make it work. but I think JSdiff is unable to give the difference between two contents. Is there any alternative to do this. Here is the code I tried. Thank you.

const express = require('express'); const bodyParser = require('body-parser'); const request = require('request'); var jsdiff = require('diff'); var JSSoup = require('jssoup').default;

//Express configuration const app = express(); app.use(express.static('public')); app.use(bodyParser.urlencoded({extended: true})); app.set('view engine', 'ejs'); const PORT = process.env.PORT || 3000;

//Main configuration variables const urlsToCheck = ["https://news.ycombinator.com/newest"]; const elementsToSearchFor = ['disaster','incident', 'crisis', 'emergency', 'imageYouWantToCheckItsExistence.png']; const checkingFrequency = 1 * 60000; //first number represent the checkingFrequency in minutes

//Slack Integration const SLACK_WEBHOOK_URL = 'https://hooks.slack.com/services/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'; const slack = require('slack-notify')(SLACK_WEBHOOK_URL);

//SendGrid Email Integration const SENDGRID_APY_KEY = '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'; const sgMail = require('@sendgrid/mail'); sgMail.setApiKey(SENDGRID_APY_KEY); const emailFrom = 'scrapingsystem@gmail.com'; const emailsToAlert = ['scrapingsystem@gmail.com', 'scrapingsystem@gmail.com'];

const checkingNumberBeforeWorkingOKEmail = 1440 / (checkingFrequency / 60000); //1 day = 1440 minutes let requestCounter = 0;

//Main function const intervalId = setInterval(function () {

urlsToCheck.forEach((urlToCheck) => {

request(urlToCheck, function (err, response, body1) {
    //if the request fail
    if (err) {
        console.log(`Request Error - ${err}`);
    }
    else {
        //if the target-page content is empty
        if (!body1) {
            console.log(`Request Body Error - ${err}`);
        }
        //if the request is successful
        else {

    var soup1 = new JSSoup(body1) ;
    oldBody = soup1.getText();

    const timeoutId = setTimeout(function () {

            request(urlToCheck, function (err, response, body2) {
              //if the request fail
             if (err) {
                  console.log(`Request Error - ${err}`);
             }
             else {
              //if the target-page content is empty
              if (!body2) {
                      console.log(`Request Body Error - ${err}`);
             }
                //if the request is successful
            else {
            var soup2 = new JSSoup(body2);
            newBody = soup2.getText();
        //  console.log(oldBody);

            var newChanges = jsdiff.diffWordsWithSpace(oldBody, newBody);
            var newContent = JSON.stringify(newChanges);
            console.log(newContent);

                     //if any elementsToSearchFor exist
                    if (elementsToSearchFor.some((el) => newContent.includes(el))) {

                     // Slack Alert Notification
                     slack.alert(`🔥🔥🔥  <${urlToCheck}/|Change detected in ${urlToCheck}>  🔥🔥🔥 `, function (err) {
                         if (err) {
                             console.log('Slack API error:', err);
                        } else {
                              console.log('Message received in slack!');
                         }
               });

                // Email Alert Notification
                const msg = {
                    to: emailsToAlert,
                    from: emailFrom,
                    subject: `🔥🔥🔥 Change detected in ${urlToCheck} 🔥🔥🔥`,
                    html: `Change detected in <a href="${urlToCheck}"> ${urlToCheck} </a>  `,
                };
                sgMail.send(msg)
                    .then(()=>{console.log("Alert Email Sent!");})
                    .catch((emailError)=>{console.log(emailError);});
            }

        }
        }        
            });
                    }, checkingFrequency);

    }
    }

})
});

requestCounter++;

// "Working OK" email notification logic
if (requestCounter > checkingNumberBeforeWorkingOKEmail) {

    requestCounter = 0;

    const msg = {
        to: emailsToAlert,
        from: emailFrom,
        subject: '👀👀👀 Website Change Monitor is working OK 👀👀👀',
        html: `Website Change Monitor is working OK - <b>${new Date().toLocaleString("en-US", {timeZone: "Asia/Kolkata"})}</b>`,
    };
    sgMail.send(msg)
        .then(()=>{console.log("Working OK Email Sent!");})
        .catch((emailError)=>{console.log(emailError);});
}

}, checkingFrequency);

//Index page render app.get('/', function (req, res) { res.render('index', null); });

//Server start app.listen(PORT, function () { console.log(Example app listening on port ${PORT}!) });