Closed geoffturk closed 4 years ago
profileUrl
can be obfuscated and/or changed to avoid data being scrapped by aggregators who do not have access to the indexurl
field and then aggregators can see if the primary URL (url
) and profileUrl
contain the same root domain, thus giving weight to the fact that the primary URL is definitively linked to the organization described in the profileUrl
Possible disadvantage (although I guess this applies in other circumstances too and is hence not just related to using the profile URL as the unique ID) is abuse: A bad actor could create a fake profile and host this somewhere... whereas, if the profileurl had to contain the entity's URL, or a recognised (authorised?) hosts URL, we would be able to identify fake profiles.
@olisb See my comment about checking that the root domain is the same between the profileUrl
and the url
(primary URL of the organization) to determine if there is potential spoofing.
Overall the advantages are pretty significant so we will go forward with using the hash of the profileUrl
(the nodeId
) as the unique identifier of a node in the index.