One of Jesse's sites (8300128_ds) doesn't show up in the csv after running fiss_site_tidy script

NewGraphEnvironment / dff-2022

Building digital field forms and processing data collected using R, postgresql, QGIS and other tools

Creative Commons Zero v1.0 Universal

0 stars 3 forks source link

One of Jesse's sites (8300128_ds) doesn't show up in the csv after running fiss_site_tidy script #49

Closed Mateo9569 closed 1 year ago

Mateo9569 commented 1 year ago

When viewing the form after deleting the duplicates on line 94, Jesse's site is visible. But when viewing the csv it dissapears, so something is going on between line 94 and the end of the script. Will get to the bottom of this and close when resolved.

NewGraphEnvironment commented 1 year ago

sounds like something is filtering out that row between the duplicate filter and the burn. Is it Row 4 (https://github.com/NewGraphEnvironment/dff-2022/blob/de46433b0b8e8cabc155f7063f1689ec5bca7cf0/scripts/fiss_site_tidy.R#L131) ? If so we could look to filter that row explicitly using a column other than the row_id (convienient to use row_id but not as safe as something more site specific). We can use & to filter on more than one column too of course. I can help if a quick troubleshoot doesn't bear 🥝

Mateo9569 commented 1 year ago

Yes that was the problem. When I remove that line it works. Why was row 4 being filtered out?

NewGraphEnvironment commented 1 year ago

there is a note in the script about that trib to gramaphone (https://github.com/NewGraphEnvironment/dff-2022/blob/de46433b0b8e8cabc155f7063f1689ec5bca7cf0/scripts/fiss_site_tidy.R#L95). We should be pulling it out still but we should use something more specific than the rowid as that has obviously changed. The delete dups line should be something more specific too https://github.com/NewGraphEnvironment/dff-2022/blob/de46433b0b8e8cabc155f7063f1689ec5bca7cf0/scripts/fiss_site_tidy.R#L89. We can see what is duplicated and how to filter by viewing the dups object

Mateo9569 commented 1 year ago

Ok I see what you mean. The row ids change so you're saying it would be better to filter by something that's more unique to the entry we want to get rid of.

Mateo9569 commented 1 year ago

Is it ok if I close this issue and make a PR or do you want to take a crack at it? I left it at filtering by the row ids for now even though I know that's not the best way. There's a function in dplyr called distinct() that allows you to filter out unique entries. But the problem is that the duplicate entries aren't exactly alike, yes they share the same local name but some have NA fields and some don't.