fossasia / loklak_scraper_js

Scrapers for loklak in javascript
GNU Lesser General Public License v2.1
1.47k stars 16 forks source link

Add API for using JS scrapers as loklak harvesting workers #37

Open singhpratyush opened 7 years ago

singhpratyush commented 7 years ago

Issue Description

Issue type: Parent issue

As of now, this JS has to be bundled so that it can be used in other projects and even then, the functions have to be manually imported.

It would be good to have an API of the following type or similar -

import { loklakHarvester } from 'loklak_scrapers_js';

let myLoklakHarvester = loklakHarvester('http://api.loklak.org', 4)
                            .onHarvestStart((backend, query) => {
                                ...
                            })
                            .onHarvestComplete((backend, query, messages) => {
                                ...
                            })
                            .onHarvestError((backend, query, error) => {
                                ...
                            })
                            .onPushStart((backend, messages) => {
                                ...
                            })
                            .onPushComplete((backend, messages) => {
                                ...
                            })
                            .onPushError((backend, messages, error) => {
                                ...
                            })
                            .onSuggestionFetch((backend, suggestions) => {
                                ...
                            })
                            .onShutDown(() => {
                                ...
                            });

...

myLoklakHarvester.setBackend('http://backend.loklak.org');
myLoklakHarvester.setWorkers(3);

...

myLoklakHarvester.shutDown();

This would facilitate usage of loklak_scrapers_js in many projects and also allow an easy, plug and play interface for any website.

singhpratyush commented 7 years ago

Or maybe something like this -

import { LoklakHarvester } from 'loklak_scrapers_js';

export default class MyLoklakHarvester extends LoklakHarvester {

    constructor () {
        super({
            backend: 'http://loklak.org',
            workers: 4,
            ...
        });
    }

    onHarvestStart = (backend, query) => {
        ...
    }

    onHarvestComplete = (backend, query, messages) => {
        ...
    }

    onHarvestError = (backend, query, error) => {
        ...
    }
    ...
}

...

let harvester = new MyLoklakHarvester();
harvester.start();

...

harvester.stop();

@Achint08 @djmgit @hemantjadon @kavithaenair @SKrPl @vibhcool: What do you think about this? Is this approach reasonable to run loklak_scrapers_js wherever a web page is open?

vibhcool commented 7 years ago

@singhpratyush , I have a doubt, are we discussing about creating restful API or http api or a javascript library for scrapers?

singhpratyush commented 7 years ago

On a whole, you can think it as a npm package which would allow scraping and submitting data to loklak.

hemantjadon commented 7 years ago

@singhpratyush I agree with this approach we can make the loklak_scrapper_js totally configurable and more usable this way 👍

vibhcool commented 7 years ago

ok, got it, but there is issue. the multithreading task shall be handled on loklak that are using this. But node.js is not good at multithreading performance.

what javascript does best is scraping and dealing with javascript running on webpages.

singhpratyush commented 7 years ago

I haven't mentioned anything about multithreading here. I guess you got the idea from workers.

This is just a raw thing that I just came up and needs to be discussed before proceeding. It can be the number of simultaneous requests that are made to the services (Twitter, Github, etc.).