dnif-archive / DigiVigi

GNU General Public License v3.0
1 stars 7 forks source link

Scrapping only updated data from source website #7

Closed PRASHANT-SAWANT closed 6 years ago

PRASHANT-SAWANT commented 6 years ago

Hey, This is with regards to Process 2 - Stage 3. After scrapping data from a source website, the next challenge was to fetch only that data which comes anew on a daily basis, as well as keep the old data.

I'm thinking of comparing date fields from source dataset and the system date to append only the new record to an existing csv. Hope it works out well.

PRASHANT-SAWANT commented 6 years ago

Hey Fellas, Eureka! The date logic worked. I've been thinking, since most log data have log-date to them, it could be of real use when considering a dynamic dataset in future for these sort of projects.

I'll upload the code in the repository. @aakratisahu @Sharbanibasu23 @shreyaskulkarni412

PRASHANT-SAWANT commented 6 years ago

Here's the current logic. datetime.datetime.now().date() == parser.parse(dateComparer.text).date() The date field of source : Webiron has time up till milliseconds. That could be scrapped of further by logic, if one is interested in only the date part.