Tazinho / snakecase

🐍🐍🐍 A systematic approach to parse strings and automate the conversion to snake_case, UpperCamelCase or any other case.
https://tazinho.github.io/snakecase/
GNU General Public License v3.0
147 stars 9 forks source link

Numerals in abbreviations #155

Closed ilarischeinin closed 5 years ago

ilarischeinin commented 5 years ago

I have a case that I cannot figure out how to get right. I don't think it's an exotic one, so I'd imagine it must be already possible, but just can't get my head around how to specify it with the available parameters.

I want to convert (from snake case) "t2d_status" to lower camel case "t2dStatus". The problem is that no matter what I try, I get "t2DStatus", i.e. with a capital "D" whereas I need a lowercase "d".

library(snakecase)
to_lower_camel_case("t2d_status")
#> [1] "t2DStatus"

I tried to specify "t2d" as an abbreviation so it wouldn't get broken down:

to_lower_camel_case("t2d_status", abbreviations = "t2d")
#> [1] "t2DStatus"

Also tried to specify to keep numerals as is:

to_lower_camel_case("t2d_status", numerals = "asis")
#> [1] "t2DStatus"

And to change sepin to just "":

to_lower_camel_case("t2d_status", sep_in = "_")
#> [1] "t2DStatus"

Created on 2018-11-16 by the reprex package (v0.2.1)

None of these seem to help (nor my attempts with parsing_option or transliteration), so could you please point me to the right direction here? Thank you.

I did try to go through the issue tracker to see if a case like this had popped up before, but that was kind of difficult as so many issues are not very descriptive, but more of record keeping on things to be implemented.

Tazinho commented 5 years ago

Thanks for reporting this. In theory you are right with the first approach. The implementation of abbreviations is just too naive atm.

Currently matches of abbreviations will be surrounded internally by underscores to ensure they are recognized as substrings. However, the substrings (abbreviations) are then parsed further and in your case t2d will be parsed into 3 substrings (because of the number).

I think a perfect solution would be to ignore the abbreviations during the parsing step. However, I am not sure how to implement this in an elegant way and will have to think a bit about that.

Tazinho commented 5 years ago

Possible implementation idea: string -> abbreviations ->

sep_in -> parsing_option -> split ->

-> ...

Edit: otherwise it might be possible to work around the numerals parsing

The third and possibly best approach would be to split first on the abbreviations, mark the abbreviations and then split a second time on the parsing of the non-abbreviation substrings. However, will need to evaluate this approach in a new dev branch first.

Tazinho commented 5 years ago

Once I get to this the process must probably look like this:

Tazinho commented 5 years ago

The above still sounds like significant overhead. Maybe the following could work:

Tazinho commented 5 years ago

Implemented in devversion-01 branch for now (almost as mentioned in the last post; not yet tested; also need to remove some overhead introduced by the current verbose implementation):

Open steps: