CUB-Libraries-CTA / counter-data-loader

Loads COUNTER database from JR1 report spreadsheets
1 stars 2 forks source link

Merge duplicate titles #34

Open ghost opened 3 years ago

ghost commented 3 years ago

The spreadsheet loading process sometimes results in the creation of duplicate title records. A duplicate title is one where the title name, publisher, and platform are the same. If there is a slight variation in either the name or the publisher, e.g., Free Press vs. The Free Press, a new row will be inserted. Logically, these records are the same and therefore can be combined. Refer to the attached image for an example (the first 3 items are the same).

Since we don't have any control over how each platform reports on their respective publications, ongoing creation of duplicates is inevitable. Handling duplicates, therefore, should be handled as a period database maintenance routine.

counter-duplicate-title-example