Currently a single medium is stored under different names and names are also case-sensitive. A normalization method could help to map all media to unique names as they are suggested by RTR.
Suggestions:
turn all names into lowercase
remove "www." from every URL
remove all articles (der,die,das) from names
remove all blanks, dots, dashes etc from names
...
Also checkout OpenRefine and its clustering algorithms like ngram-fingerprint
Currently a single medium is stored under different names and names are also case-sensitive. A normalization method could help to map all media to unique names as they are suggested by RTR.
Suggestions:
Also checkout OpenRefine and its clustering algorithms like ngram-fingerprint