SlinkyLincy / intro-data-capstone-biodiversity

0 stars 0 forks source link

Be sure to account for duplicates in counts #1

Closed karl-project-review closed 6 years ago

karl-project-review commented 6 years ago

https://github.com/SlinkyLincy/intro-data-capstone-biodiversity/blob/8666b27f6069db09ced697dee3c533bfbb10dcc7/Petra%20submission/biodiversity.py#L56 Since we are looking for how many different species there are, we need to account for duplicates. Thus, we should use nunique() rather than len(). This issue also comes up in lines 89, 135, and 198 (count() is used instead of nunique()), and the instance of not accounting for duplicates in line 198 leads to the p-values in the significance tests being off. Be sure to account for duplicates when appropriate.

SlinkyLincy commented 6 years ago

Addressed in commit 3b15cb7