casics / spiral

A Python 3 module that provides functions for splitting identifiers found in source code files.
GNU General Public License v3.0
47 stars 9 forks source link

Ronin splits #2

Open rsuhaibani opened 5 years ago

rsuhaibani commented 5 years ago

The identifier [unloadAssemblies] has been divided as ['unload', 'Ass', 'embl', 'ies']

mhucka commented 5 years ago

Thanks for your report. Sometimes, some identifiers will inevitably be split suboptimally, and I guess this is an example of that. As mentioned in the README section on performance, the heuristic nature of Ronin makes it hard to be perfect for all cases. I think the only thing that could be done here is to retrain it on more/other examples or to come up with a different algorithm altogether.

rsuhaibani commented 5 years ago

Clear! So, I can train it by myself on other examples and correct the results? I am working on analyzing more than 100K identifiers. Is it easy to do that?