Right now the pipeline writes the "different_genbank_species" attribute to a Species the first instance where the difference is observed between an Occurrence and a Genbank record (this is where Genbank's taxonomy has a Species name that differs from the GBIF Occurrence taxonomy, becuase that is the taxonomy that is used). This is by design.
However, it is observed that:
Some Species have multiple variations though some of those variations could be collapsed with a clever rule (one case had 70 variations).
I left the different_genbank_species captured in each Occurrence, so you can do in the rails console something like:
Species.where.not(different_genbank_species: nil).each do |s|
x = s.occurrences.pluck(:different_genbank_species).uniq
puts x if x.count > 1
end
to see where this is a problem.
Another issue is we don't differentiate between "different_genbank_species" per gene, but at the species level.
It is likely that its good enough for the user to know there are differences and see an example difference though, since they can go back to the original records and view the details if they are really interested.
[duplicated the UCR repo]
Right now the pipeline writes the "different_genbank_species" attribute to a Species the first instance where the difference is observed between an Occurrence and a Genbank record (this is where Genbank's taxonomy has a Species name that differs from the GBIF Occurrence taxonomy, becuase that is the taxonomy that is used). This is by design.
However, it is observed that: Some Species have multiple variations though some of those variations could be collapsed with a clever rule (one case had 70 variations).
I left the different_genbank_species captured in each Occurrence, so you can do in the rails console something like:
to see where this is a problem.
Another issue is we don't differentiate between "different_genbank_species" per gene, but at the species level. It is likely that its good enough for the user to know there are differences and see an example difference though, since they can go back to the original records and view the details if they are really interested.