Closed MSIH closed 3 years ago
Yes, that is expected. If you want separate stats for each run you should clear the storages either by running apify run -p
, deleting the file, or programmatically by deleting this specific value from the key-value store.
I understand that is how the sdk works but I would not say most people would consider that "expected" behavior when running locally. On the Apify platform, each run has own storage folder, but not the case when run locally.
I would of thought that the last number would increment each time the crawl was run.
I did do a work around.
const getStats = await Apify.getValue('SDK_CRAWLER_STATISTICS_0');
if (getStats !== null) {
getStats.requestMinDurationPerSeconds = (getStats.requestMinDurationMillis / 1000);
getStats.requestMaxDurationPerSeconds = (getStats.requestMaxDurationMillis / 1000);
getStats.requestAvgFinishedDurationPerSeconds = (getStats.requestAvgFinishedDurationMillis / 1000);
getStats.requestTotalDurationMinutes = (getStats.requestTotalDurationMillis / 1000 / 60);
getStats.requestTotalDurationHours = (getStats.requestTotalDurationMillis / 1000 / 60 / 60);
getStats.requestPerMinute = getStats.requestTotalDurationHours
// open perfDataStorage key value store
const perfDataStorage = await Apify.openKeyValueStore('perfDataStorage');
// save perf data to file named datasetTitle
await perfDataStorage.setValue(datasetTitle, { getStats });
// delete stats SDK_CRAWLER_STATISTICS
const SDK_CRAWLER_STATISTICS = await Apify.setValue('SDK_CRAWLER_STATISTICS_0', null);
console.dir(getStats);
}
Yeah, it's not very intuitive, but if we overwrite the stats automatically, then other users, who do incremental crawls, would not be able to track total crawling stats. We chose to support both repeated and incremental crawls, at the cost of a less intuitive interface for the repeated crawls.
When running locally on linux, If I run the same job (npm run start), the file SDK_CRAWLER_STATISTICS_0 does not get overwritten, instead, the data values are added/cumulative. Specifically, itemCount. Run first time and crawl 10 items, itemCount=10. Run second time and crawl 10 items, itemCount=20.
See https://github.com/apify/apify-js/blob/master/src/crawlers/statistics.js