Closed Mateo9569 closed 1 year ago
sounds like something is filtering out that row between the duplicate filter and the burn. Is it Row 4 (https://github.com/NewGraphEnvironment/dff-2022/blob/de46433b0b8e8cabc155f7063f1689ec5bca7cf0/scripts/fiss_site_tidy.R#L131) ? If so we could look to filter that row explicitly using a column other than the row_id (convienient to use row_id but not as safe as something more site specific). We can use &
to filter on more than one column too of course. I can help if a quick troubleshoot doesn't bear 🥝
Yes that was the problem. When I remove that line it works. Why was row 4 being filtered out?
there is a note in the script about that trib to gramaphone (https://github.com/NewGraphEnvironment/dff-2022/blob/de46433b0b8e8cabc155f7063f1689ec5bca7cf0/scripts/fiss_site_tidy.R#L95). We should be pulling it out still but we should use something more specific than the rowid as that has obviously changed. The delete dups line should be something more specific too https://github.com/NewGraphEnvironment/dff-2022/blob/de46433b0b8e8cabc155f7063f1689ec5bca7cf0/scripts/fiss_site_tidy.R#L89. We can see what is duplicated and how to filter by viewing the dups object
Ok I see what you mean. The row ids change so you're saying it would be better to filter by something that's more unique to the entry we want to get rid of.
Is it ok if I close this issue and make a PR or do you want to take a crack at it? I left it at filtering by the row ids for now even though I know that's not the best way. There's a function in dplyr called distinct() that allows you to filter out unique entries. But the problem is that the duplicate entries aren't exactly alike, yes they share the same local name but some have NA fields and some don't.
When viewing the form after deleting the duplicates on line 94, Jesse's site is visible. But when viewing the csv it dissapears, so something is going on between line 94 and the end of the script. Will get to the bottom of this and close when resolved.