Open Jegelewicz opened 5 years ago
OK, so I figured out that you have to go through Manage Data and NOT the Search menu, but then I can see these two localities ARE exact matches, but am not given any to "check to merge". What gives? @dustymc
Correct - the first link is the public form.
results will have a merge link
If these were duplicates they'd be auto-merged. They have different elevation data.
Can we get Arctos to tell us what data differ when we try to merge/check for duplicates? I often spend a silly amount of time scanning every bit of data to try to find what differs and sometimes can't find it.
-Derek
On Wed, Oct 24, 2018 at 9:54 AM, dustymc notifications@github.com wrote:
Correct - the first link is the public form.
[image: screen shot 2018-10-24 at 10 52 48 am] https://user-images.githubusercontent.com/5720791/47450784-f8813780-d77a-11e8-87bd-cc4cb3f8b2ec.png
results will have a merge link
[image: screen shot 2018-10-24 at 10 52 39 am] https://user-images.githubusercontent.com/5720791/47450802-01720900-d77b-11e8-8cab-45120e5dc8e4.png
If these were duplicates they'd be auto-merged. They have different elevation data.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/documentation-wiki/issues/62#issuecomment-432763303, or mute the thread https://github.com/notifications/unsubscribe-auth/AIraM3RW8Ue7N3TA7pkb-u6O_p9GrUu-ks5uoKlTgaJpZM4X4ctj .
--
+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960
dssikes@alaska.edu
phone: 907-474-6278 FAX: 907-474-5469
University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++
Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php
Obviously, I didn't see that difference, so....what @DerekSikes said.
Yes, this one was particularly difficult - 3570 vs 3750?
On Wed, Oct 24, 2018 at 12:23 PM DerekSikes notifications@github.com wrote:
Can we get Arctos to tell us what data differ when we try to merge/check for duplicates? I often spend a silly amount of time scanning every bit of data to try to find what differs and sometimes can't find it.
-Derek
On Wed, Oct 24, 2018 at 9:54 AM, dustymc notifications@github.com wrote:
Correct - the first link is the public form.
[image: screen shot 2018-10-24 at 10 52 48 am] < https://user-images.githubusercontent.com/5720791/47450784-f8813780-d77a-11e8-87bd-cc4cb3f8b2ec.png
results will have a merge link
[image: screen shot 2018-10-24 at 10 52 39 am] < https://user-images.githubusercontent.com/5720791/47450802-01720900-d77b-11e8-8cab-45120e5dc8e4.png
If these were duplicates they'd be auto-merged. They have different elevation data.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/ArctosDB/documentation-wiki/issues/62#issuecomment-432763303 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AIraM3RW8Ue7N3TA7pkb-u6O_p9GrUu-ks5uoKlTgaJpZM4X4ctj
.
--
+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960
dssikes@alaska.edu
phone: 907-474-6278 FAX: 907-474-5469
University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++
Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/documentation-wiki/issues/62#issuecomment-432773456, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hC3pNYs5_W7U3DjWfuBJ0RNBmD1Qks5uoLAsgaJpZM4X4ctj .
Sure, but how do you envision that working?
after clicking merge Arctos returns a screen shot of the records like the ones above with the data that differ in red font?
-Derek
On Wed, Oct 24, 2018 at 11:06 AM, dustymc notifications@github.com wrote:
Sure, but how do you envision that working?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/documentation-wiki/issues/62#issuecomment-432789468, or mute the thread https://github.com/notifications/unsubscribe-auth/AIraM_Vf4F_e0a1fnjYBkUXwWTzTfZeZks5uoLo9gaJpZM4X4ctj .
--
+++++++++++++++++++++++++++++++++++ Derek S. Sikes, Curator of Insects Professor of Entomology University of Alaska Museum 1962 Yukon Drive Fairbanks, AK 99775-6960
dssikes@alaska.edu
phone: 907-474-6278 FAX: 907-474-5469
University of Alaska Museum - search 400,276 digitized arthropod records http://arctos.database.museum/uam_ento_all http://www.uaf.edu/museum/collections/ento/ +++++++++++++++++++++++++++++++++++
Interested in Alaskan Entomology? Join the Alaska Entomological Society and / or sign up for the email listserv "Alaska Entomological Network" at http://www.akentsoc.org/contact_us http://www.akentsoc.org/contact.php
Why don't the values with discrepancies show up here in red (or at all)?
That form only shows one locality - the only way there would ever be differing data is if you've changed the filters.
That's where we need documentation. What am I supposed to do when writing SQL isn't in my list of skillz?
Use the form - it writes the SQL.
I'd probably just deprecate that form - localities are now auto-merged - but maybe it's somehow still useful??
It is useful to have a form that would show us potential duplicates. Maybe with elevations that differ by transposed digits?
On Wed, Oct 24, 2018 at 2:05 PM dustymc notifications@github.com wrote:
Use the form - it writes the SQL.
I'd probably just deprecate that form - localities are now auto-merged - but maybe it's somehow still useful??
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/documentation-wiki/issues/62#issuecomment-432808684, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hIQETWRQmGzbhJRgBLxrC3A7nYWtks5uoMgMgaJpZM4X4ctj .
I still don't understand - how do you "use the form"?
show us potential duplicates.
I agree, but I still don't know how to do that!
Localities are sorta all potential duplicates. We seem to be heading towards less normalization, which is going to make things like this more difficult to find by adding even more things that can cause "functional duplicates."
is from
create table temp_dup_specloc as select * from locality where spec_locality in (
select spec_locality from locality having count(*) > 20 group by spec_locality
);
UAM@ARCTOS> select count(*) from temp_dup_specloc;
COUNT(*)
----------
46927
1 row selected.
Elapsed: 00:00:00.48
UAM@ARCTOS> select count(distinct(spec_locality)) from temp_dup_specloc;
COUNT(DISTINCT(SPEC_LOCALITY))
------------------------------
643
Maybe there's some idea for detecting almost-duplicates in that?
There's definitely plenty of obviously suspicious data - eg
Are those elevations wonky, or was this a cliff (the data say there's a 500' vertical change over less than 50'), or ???
I could probably write SQL to detect similar data, but it would be sort of a pain (and perhaps not very "smart") with the tools I have now - that should be trivial and obvious in a spatial query, if we had the tools to support that sort of thing.
how do you "use the form"?
so to check for elevation variations you could remove those - change this
to this
click this
which writes and executes the SQL and displays anything that varies only by elevation below.
I never would have guessed that....
You don't have to - there's documentation at the top of the page!
I don't see what you explained to do in that documentation. That's why this issue is here. We need something that people completely unfamiliar with the process can use to lead them through the process.
I also found the form confusing. I did not understand what we were supposed to do with the grey and white fields. Especially when I was looking at two localities that to my eye appeared identical (only the 3750 vs 3570 elevation being different, but I didn't see that.) Maybe in the grey/white comparison, any differences could be a different color? And yes, I am not spending a lot of time reading the find print of long text documentation. We need an interface with clear step by step guidance.
On Wed, Oct 24, 2018 at 8:20 PM Teresa Mayfield-Meyer < notifications@github.com> wrote:
I don't see what you explained to do in that documentation. That's why this issue is here. We need something that people completely unfamiliar with the process can use to lead them through the process.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/documentation-wiki/issues/62#issuecomment-432890570, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hOOmp3L41_9tQlbNdLUD7En8toHoks5uoR_3gaJpZM4X4ctj .
Yea, developers generally shouldn't write documentation.
There's no comparison there - the dark gray is the "seed" data, the open text boxes allow fuzzy-matching almost-duplicates.
What are we supposed to do in each field? change the entries? What is this form supposed to do?
On Wed, Oct 24, 2018 at 8:35 PM dustymc notifications@github.com wrote:
Yea, developers generally shouldn't write documentation.
There's no comparison there - the dark gray is the "seed" data, the open text boxes allow fuzzy-matching almost-duplicates.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/documentation-wiki/issues/62#issuecomment-432893101, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hK2_mex--FSgwVrc7m5XqluWYcssks5uoSNegaJpZM4X4ctj .
Back in the dark ages, it merged duplicate localities. There are no duplicate localities (not for very long, anyway) anymore because they're auto-merged. The form can still be used to merge almost-duplicates.
I found a locality (http://arctos.database.museum/editLocality.cfm?locality_id=10736252), clicked check dups, changed the specloc from "no specific locality" to "ignore" and it found...
a not-quite-duplicate that differs only by specloc.
So I tried to use the form for localities 10881399 http://arctos.database.museum/editLocality.cfm?locality_id=10881399 and 10881400 http://arctos.database.museum/editLocality.cfm?locality_id=10881400 which differ only in the mistranscribed elevation. If I did not know in advance (as I originally did not), that elevation was not identical, I'd have to go through each field and put in ignore in order to find the reason the two are not identical - there is no other way? Because I know now that elevation is the only distinguishing field, I changed max elevation and minimum elevation to ignore and clicked filter table below (although it's not clear what that table is - the SQL?) and I did get the 10881400 locality to show up. But again, there is nothing to show me how this locality differs from the one I am being given the option to merge it with, other than inspecting each field very carefully. And we've seen how ineffective that process can be since most of us did not catch the original 3750/3570 difference in the first place. Without a clear explanation of how these two fuzzy localities differ, we are going to be introducing more error by merging things that shouldn't have been merged and vice versa.
On Wed, Oct 24, 2018 at 8:47 PM dustymc notifications@github.com wrote:
Back in the dark ages, it merged duplicate localities. There are no duplicate localities (not for very long, anyway) anymore because they're auto-merged. The form can still be used to merge almost-duplicates.
I found a locality ( http://arctos.database.museum/editLocality.cfm?locality_id=10736252), clicked check dups, changed the specloc from "no specific locality" to "ignore" and it found...
[image: screen shot 2018-10-24 at 7 44 14 pm] https://user-images.githubusercontent.com/5720791/47472937-563a7180-d7c5-11e8-8d76-e6d96be0e528.png
a not-quite-duplicate that differs only by specloc.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/documentation-wiki/issues/62#issuecomment-432895229, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hIaoNh-g0yqZdEWDfwidihyJFRNBks5uoSYugaJpZM4X4ctj .
@dusty is manually merging localities no longer a thing? If it is, then we need documentation for how to do it, if it isn't, we should just deprecate the How To. https://handbook.arctosdb.org/how_to/How-to-Merge-Duplicate-Localities.html
http://handbook.arctosdb.org/how_to/How-to-Merge-Duplicate-Localities.html
Does not reflect current Arctos reality. Upon completing a search and finding two identical localities, I do not see any option to “check for duplicates”.
See these search results.