ai-cfia / llamaindex-db

Semantic search operations using llamaindex.
MIT License
1 stars 1 forks source link

Remove alternate urls for the same content in `llamaindex-db` #18

Closed k-allagbe closed 4 months ago

k-allagbe commented 4 months ago

Description

The database contains alternate URLs for the same content. For example, https://inspection.canada.ca/plant-health/fertilizers/trade-memoranda/t-4-112/eng/1307864536371/1320192988468 and https://inspection.canada.ca/eng/1307864536371/1320192988468 point to the same page. These duplicates need to be identified and removed.

Tasks

Acceptance Criteria