B-Open / jobbuzz

Brunei job search database and alert notification
https://jobbuzz.org
MIT License
1 stars 1 forks source link

Job uniqueness #14

Closed syahnur197 closed 2 years ago

syahnur197 commented 2 years ago

When we run the scraper cmd, it automatically create new jobs based on the jobs returned by the scrapers. Sometimes, the jobs is already exist in the DB, how do we prevent it to be inserted?

Some ideas, store the job links in db as well, then we query based on links before we insert the job to DB

dsychin commented 2 years ago

You can refer to how I solved it in the .NET version.

Each job/company has a unique ID on the provider's platform, so we will need to scrape that and use it as the unique identifier.

https://github.com/B-Open/jobalert/blob/258fcaad69933d678a6404d8a8dec73e7fee1d02/src/Shared/Services/JobService.cs#L33-L53