Open timrobertson100 opened 6 months ago
Some notes:
associatedSequences
value of None
. This skews JOINs and needs to be ignoredIgnoring those, a straight join across associatedSequences
(i.e. overlooking the fact it needs to be split as a multivalued field) returns 19,032 relationships that span datasets. Of those, 16,936 relationships are not already detected in the clustering.
https://www.gbif.org/occurrence/4010748380 and https://www.gbif.org/occurrence/4449937675 look to be linkable based on the associated sequences
I expect having the same sequence URL would be sufficient to link without requiring additional dimensions, but we should verify this by checking the usage of the data.