The database contains alternate URLs for the same content. For example, https://inspection.canada.ca/plant-health/fertilizers/trade-memoranda/t-4-112/eng/1307864536371/1320192988468 and https://inspection.canada.ca/eng/1307864536371/1320192988468 point to the same page. These duplicates need to be identified and removed.
Tasks
[x] Develop a script to remove the longer url alternates while preserving a single entry for each unique content.
[x] Test to ensure no data loss occurs.
Acceptance Criteria
Each unique page should be accessible via a single, canonical URL.
Description
The database contains alternate URLs for the same content. For example,
https://inspection.canada.ca/plant-health/fertilizers/trade-memoranda/t-4-112/eng/1307864536371/1320192988468
andhttps://inspection.canada.ca/eng/1307864536371/1320192988468
point to the same page. These duplicates need to be identified and removed.Tasks
Acceptance Criteria