PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
203 stars 231 forks source link

duplicate sites in BETYdb #1007

Open yan130 opened 7 years ago

yan130 commented 7 years ago

Hi,

I am using met.process and find these two are duplicate sites on psql:

 id     |                  name                   |        st_astext         |         created_at         |  user_id   

------------+-----------------------------------------+--------------------------+----------------------------+------------ 772 | Niwot Ridge Forest/LTER NWT1 (US-NR1) | POINT(-105.546 40.0329) | 2011-12-10 19:09:48 | 52 1000005138 | Niwot Ridge Forest (LTER NWT1) (US-NR1) | POINT(-105.5464 40.0329) | 2016-07-28 21:04:18.008933 | 1000000001

same for US-Ha1:

 id     |                   name                   |        st_astext        |         created_at         |  user_id   

------------+------------------------------------------+-------------------------+----------------------------+------------ 758 | Harvard Forest EMS Tower/HFR1 (US-Ha1) | POINT(-72.1715 42.5378) | 2011-12-10 19:09:48 | 52 1000005128 | Harvard Forest EMS Tower (HFR1) (US-Ha1) | POINT(-72.1715 42.5378) | 2016-07-28 21:04:17.430147 | 1000000001

should we delete the new one or ???

dlebauer commented 7 years ago

removing duplicate sites is an open issue in BETYdb https://github.com/PecanProject/bety/issues/201, deleting duplicates requires updating dependent rows.

mdietze commented 7 years ago

These duplicates were removed. However, checking other sites with adjacent ID numbers suggests that a number of duplicate sites were created -- this appears to have occurred when I created the site groups for Fluxnet2015 (probably because the coordinates in that site were slightly different from the Ameriflux and Fluxnet sites). A side effect of removing EMS and NR1 are that they are now no longer in the Fluxnet2015 site group. I need to write a script that goes through the entirety of Fluxnet2015 to clean up duplicates and associate the site group with the correct (non duplicate) sites, and then fix the original script to not create duplicates (as this script will need to be rerun periodically as new sites are added to the FLUXNET synthesis).

dlebauer commented 7 years ago

see also pecanproject/bety#246 and pecanproject/bety#201 for a more comprehensive discussion of duplicate sites, how to find and combine them, and how to prevent duplicates in the future (by cleanup + adding constraints)

dlebauer commented 7 years ago

And here is a generic SQL function to merge duplicate records: https://github.com/PecanProject/bety/issues/185#issuecomment-71534163

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 365 days with no activity.