forc-db / ForC

Global Forest Carbon Database
https://forc-db.github.io/
Creative Commons Attribution 4.0 International
55 stars 24 forks source link

revamp script ID_sets_of_duplicate_records.R #206

Closed ValentineHerr closed 3 years ago

ValentineHerr commented 4 years ago

The size of ForC is getting too big for this code.

I was able to use it with the GROA data but I fear the memory limit will be reached next time we use it. (problem occurs at line 162)

At some point, I need to sit down and revamp it, but that will take a while.

ValentineHerr commented 3 years ago

I found a way to make the script faster so I don't think we will run into running time issues anymore.

The current situations is that every time this script is run, it will generate a pop-up window if it finds conflicts that are different than what was already in the data. In that case, someone needs to go through each of those (usually a hand full unless a large number of records were added/modified) and decide if the new conflict is correct or if the old one was better (usually when it was manually edited).

My current way of doing that is to list the measurement.ID (separated by a comma) that belong to the same conflict in one of two different groups: 1. keep old conflict and 2. keep new conflict. If nobody goes through this, the old conflict is saved (will be flagged again next time). Currently this list is in the R script but I could create a csv file instead.

One issue with that is if someone edits the conflicts of a group that is already listed under the "2. keep new conflict" group.... as the manual edit would be overwritten... I think it is safe for me to delete what is under group "2. keep new conflict" once those are saved.... that way, if someone edits them, they will be flagged in the pop-up window and their measurement ID can be added to 1. keep old conflict....

I don't currently see a way to make that automatic. I may just add a note in the pop-up window to say that that list needs to be emptied once MEASUREMENT table is updated with new conflicts...

Sorry for this long comment, this is for me to keep track of my thoughts...

ValentineHerr commented 3 years ago

I think I can close this now. The next step is to resolve site duplicates.