bensutherland / simple_pop_stats

A short analysis of population statistics given specific inputs
5 stars 1 forks source link

drop_pops() not actually dropping pops #11

Open erondeau opened 3 years ago

erondeau commented 3 years ago

We identified an issue, where JS was attempting to remove collections from the analysis using the drop_file option in drop_pops(). While the correct collections were printed to screen for "keeping", the final dataframe did not actually remove anything, retaining all collections.

This seems likely to be either a) the filtered dataframe is not actually being filtered (eg. filter(df,parameters), not df <-filter(df,parameters); or b) the filtered dataframe is being written over in subsequent steps (eg. the original filter works, but subsequent steps ignore this filtered df.

Follwing up...

erondeau commented 3 years ago

So, after closer examination, I don't believe this is a script issue, but rather reflects a more interesting conundrum surrounding the use of the "region" being appended to the end of the collection name in step 2 (eg. append_region =TRUE in fix_pop_names(). This is a useful feature - allows for quickly identifying regional or CU groupings when organizing a dendrogram - but means that the "drop_pops" file needs to also include the region for a match to be made.

Three potential fixes:

1) Just leave it as is, and make this clear in the README? You could probably do this region appending easily enough from the "StockCodes" file created by prepare_stockcode_file() in step 2, in excel (or even directly in that script). Excel would be =B2&"_"&C2 then dragging that down. In R, this could be a fourth column that is output to this StockCode file, such as df$collection_repunit <- paste(df$collection, df$repunit, sep="_")

2) Allow for a string partial match. This I don't like, because there will be situations where you might want to drop "Chilliwack" but not "Chilliwack-summer". We would have to rely on the string being unique, which will be dangerous (99.9% successful, but that 0.1 will cause problems".

3) Append the region to the filter file, by new script?

4) ... Suggestions?

JanineSupernault commented 3 years ago

@erondeau, may not be the best solution but if I did the fix pop without appending then applied the drop pop filter and went back to fix pop script and did the appending to region?

erondeau commented 3 years ago

This is a good idea - although it didn't work as currently written. Still, some version of Janine's suggestion is likely best, and can be written into the Readme. I will follow-up next week on how to implement it!