Dataframe naming convention naming collision with pandas

alexandermorgan commented 8 years ago

Our naming convention for the highest level (level 0) of our multiindex for the columnar axis of dataframes is the string: 'indexer_filename.IndexerClassName' Say you have a dataframe with the noterest, duration, and dissonance indexer results all concatenated together along the columnar axis. Then you want to select just the noterest results. You should be able to do this: df.levelZeroNameOfSubsectionYouWant Note that after the period, it is not a string. For the noterest results this would be

df.noterest.NoteRestIndexer The collision happens because our dataframe naming convention includes a period. I propose that we eliminate the module name ('noterest' in the example above) and the period. Then we could access the dataframe as expected like this: df.NoteRestIndexer There are other label-based ways to access dataframes that still work for us such as df['noterest.NoteRestIndexer'] or df.loc[:, 'noterest.NoteRestIndexer'] The point of this change would be to make our syntax easier for people new to our framework but already experienced with pandas. Also, it will make our naming convention less clunky by getting rid of the relatively unimportant module names.

crantila commented 8 years ago

Something about this makes me uneasy. I'm not sure whether there's a technical reason to object, or if it's just because this is a backward-incompatible change with dubious benefit.

There were two reasons I put the module name in the DataFrame. First, that's how the analyzers should be imported in analysis scripts and client programs. You can take the string of an analyzer's name and use it to find the actual class (if it's imported). Second, to avoid an actual naming collision, in case there would be conflicting ways to do things, like caplin.CadenceIndexer and hepokoski.CadenceIndexer. (The situation you describe here is less like a collision, more like "pandas offers three ways to access columns and one doesn't work with VIS.")

ahankinson commented 8 years ago

You are proposing to reduce one level of namespace. The tricky thing is that Python treats colliding class names silently, so if you end up in the situation that Christopher mentions, you wouldn't know whether df.CadenceIndexer was the Caplin one or the Hepokoski one.

In [1]: class Foo:
   ...:     def __init__(self):
   ...:         print('bar')
   ...:

In [2]: f = Foo()
bar

In [3]: class Foo:
   ...:     def __init__(self):
   ...:         print('baz')
   ...:

In [4]: f = Foo()
baz

alexandermorgan commented 8 years ago

I see your guys' point, but I think that having two indexer classes with the same name but in different modules is something that we should avoid no matter what our naming conventions are. So I don't see how Christopher's ambiguous case could ever arise. That being said, no problem, we can just keep the naming convention as it is.

ahankinson commented 8 years ago

Keeping method and class names unique across a large piece of software is why namespaced conventions were invented. Otherwise you end up with a language like objective c, which doesn't support namespaces and which relies on ad hoc naming conventions to ensure that an NSString and a CFString are two different types of objects. It's fine if you can keep the whole thing in your head, but if it grows, and if more people start adding to it, namespaces will be much more important.

ELVIS-Project / vis-framework

Dataframe naming convention naming collision with pandas #394