drawrowfly / amazon-product-api

Amazon Scraper. Scrape products from the amazon search result or reviews from the specific product
631 stars 181 forks source link

CLI output name and actual output name do not match #10

Closed kkristof200 closed 4 years ago

kkristof200 commented 4 years ago

amazon-buddy products -k 'fixie bike' -n 100 --random-ua true --sort true --min-rating 4.5 --filetype json Outputs -> Result was saved to: products(fixie bike)_1593862189487

Actual file name: products(fixie bike)_1593862189473.json

There seems to be an inconsistency at the timestamps used for the file name.

kkristof200 commented 4 years ago
get fileName() {
    switch (this._scrapeType) {
        case 'products':
            return `${this._scrapeType}(${this._keyword})_${Date.now()}`;
        case 'reviews':
        case 'asin':
            return `${this._scrapeType}(${this._asin})_${Date.now()}`;
        default:
            throw new Error(`Unknow scraping type: ${this._scrapeType}`);
    }
}

This method returns a different name (different timestamp since 'Date.now()' is called). So it won't be consistent during a session.

I'd add

this._fileName = null;

to the constructor and convert the 'get fileName()' method to

get fileName() {
    if (this._fileName != null) {
        return this._fileName;
    }

    switch (this._scrapeType) {
        case 'products':
            this._fileName = `${this._scrapeType}(${this._keyword})_${Date.now()}`;
        case 'reviews':
        case 'asin':
            this._fileName = `${this._scrapeType}(${this._asin})_${Date.now()}`;
        default:
            throw new Error(`Unknow scraping type: ${this._scrapeType}`);
    }

    return this.fileName;
}

If lazy inits/vars are available in js, then it would solve the issue without extra created variables. I'm not very familiar with js so the syntax might not be on point.

kkristof200 commented 4 years ago

other solution would be to create a date object on init, like

this.sessionStartTs = Date.now();

and rewrite 'get fileName()' like this

get fileName() {
    switch (this._scrapeType) {
        case 'products':
            return `${this._scrapeType}(${this._keyword})_${this.sessionStartTs}`;
        case 'reviews':
        case 'asin':
            return `${this._scrapeType}(${this._asin})_${this.sessionStartTs}`;
        default:
            throw new Error(`Unknow scraping type: ${this._scrapeType}`);
    }
}
kkristof200 commented 4 years ago

All of the above would work only if a new 'AmazonScraper' is created for every new scrape. If that not is the case, the fileName, or timestamp needs to be reset at every new scrape call.

drawrowfly commented 4 years ago

fixed in latest update