Closed misialq closed 3 years ago
Thanks @misialq ! Why not just reorder so that Taxon
is first, and skip the capitalization? Capitalizing is not a big change, but just seems unnecessary if you are still reordering.
That's true - it wouldn't be necessary here. My thinking was to just clean it up to prevent a similar issue from occurring somewhere down the line (not that I can think of any specific place; by similar issue I mean unexpected sorting). I can remove that part though if you think that's unlikely.
I suppose one possible future issue that it addresses is inconsistent capitalization (e.g., merging one df with consensus
and another with Consensus
). But this is probably low priority 🤷 as it should not occur now (theoretically it could if importing a taxonomy from elsewhere)
@mikerobeson what's your take on this? Is it more important to preserve original case or more important to prevent possible future bugs regarding capitalization inconsistency? I think I am happy either way, but I always just get a little bit nervous when altering original column names 😬
@nbokulich, I am also on the fence about whether or not we should capitalize the column names. However, I would not be surprised, when merging taxonomies from different databases, that a something like this may crop up unexpectedly. I agree that an issue would more likely crop up form a user importing their own files, rather than anything we may parse via RESCRIPt. I'm leaning towards keeping the capitalization, as long as it does not require us to update all the test cases.
Thanks for the input @mikerobeson ! @misialq has already updated the unit tests to capitalize.
My gut feeling is that we should not alter the column names, but I see possible issues in either direction... @misialq I leave the final decision up to you 😁
Thanks @both, in that case I think I'll keep the capitalization and pray that the god of dataframes looks graciously upon this change.
When the DataFrames get combined the columns will be sorted alphabetically, with the preference to first sort the capitalized column names and only then the other ones. That causes the column 'Taxa' to loose its position as a first column (required) when the other column names were provided as capitalized words.
Changes introduced in this PR allow to first capitalize all the column names to ensure consistent sorting behaviour during merge and then ensure that the 'Taxon' column is always first, regardless of the other columns. The tests were adjusted to reflect this behaviour.
Closes #112.