Possible naming improvements

bluebird75 commented 6 years ago

I was a bit surprised when approaching the package by the complexity of the naming.

What surprised me :

the tool is named pyannotate but you must import pyannotate_runtime to use it
collect_types.init_types_collection() ? Does the name really need to be that long ? Do we really need to explicitly init by the way ? Why not initialize on first resume() ?
dump_stats() is ok but dump() is good too

Suggestions :

allow to import directly from pyannotate
3 functions only : start(), stop(), dump()
initialize on first start() or rename init_types_collection() to init()

By the way, I am looking into contributing to pyannotate.

bluebird75 commented 6 years ago

Even dump() could be left to an atexit handler.

gvanrossum commented 6 years ago

The reason for the unusual module naming is that I want the runtime support to be separate from the command line tool that reads the JSON file and updates your source code.

The reason for the separate init_types_collection() call is that this should be called before starting any threads, at least if you're interested in getting results from other threads (at Dropbox we have a solid use case for this). I suppose we can rename this to init().

Doing the dump() in atexit() feels too late and too implicit.

Honestly I would rather (first) see a contribution that added good documentation, so we can stop referring to the blog post, instead of further changes to the API naming (which feels pretty shallow).

In terms of features I'd like to see a way of merging multiple type_info.json files, e.g. to combine results gathered by multiple tests run as separate processes (possibly on separate hosts), perhaps similar to the functionality in coverage.py for merging coverage results.

bluebird75 commented 6 years ago

Thanks for the clear feedback.

I don't get the point about the threads though. If you want to collect type information from other threads, you should call start() before any of these threads gets started, and start will implicitly call init() so all is fine.

And if your threads are active, even if you call start() (and so init()) late in their lifetime, you should still capture their live type information. Am I missing something ?

If you look at monkey type, their code instrumentation is much much simpler to use. Just do monkeytype run or just use one context manager.

I'll try to work on the missing aspects. Python 3 type annotations is top priority to me, beause after 10 years of lobbying, it looks like the world mostly has switched to Python 3 (eventhough Dropbox and a few other codebase are resisisting).

Improved documentation is also indeed important and should not be that difficult to get.

Then I agree that merging multiple type_info is a critical feature. This allows to collect type information in different environements and benefit from it. In my opinion, pyannotate should even propose some modes where existing function type information is enhanced with external type information, to get a more granularity for the incremental typing.

gvanrossum commented 6 years ago

The use case for threads is the Dropbox Client. We have a debug version that lets a developer start and stop type collection using a menu entry. This is handy so you can collect types for a specific sequence of operations in the app (but not for the rather long app startup sequence). The init() method needs to be called when the app initializes itself, because the app creates long-running threads. If init() was implicitly called by start(), we would collect no types from threads that have already started.

As a compromise, I think it's fine to have start() call init() if it hasn't been called yet -- we must make sure that calling init() multiple times is okay.

Note that init() also has a optional argument used to filter filenames. This makes more sense here than for start(). Similarly, dump_stats() is separate because it takes an output filename. (You can use dumps_stats() if you want to do something else with the data -- though I think it's a shame there's no API to get the JSON as a dict instead of as a string.)

What we could also use is an API that resets all the global state -- similar to what collecting_types() in pyannotate_runtime/tests/test_collect_types.py does.

So, all in all I am open to API changes but I want certain functionality to be available.

dropbox / pyannotate

Possible naming improvements #73