[WIP] Adds type checking annotations auto-generated via Instagram's MonkeyType package

mfekadu commented 4 years ago

What's New?

Instagram made a cool tool called MonkeyType
MonkeyType automatically generated static type annotations by collecting runtime types on the nimbus.py script
type annotations added to the following modules
- QA
- database_wrapper
- nimbus_nlp.NIMBUS_NLP
- nimbus_nlp.question_classifier
- nimbus_nlp.save_and_load_model
- BUT not auto-generated on the following modules
- modules.validators
- modules.formatters
- flask_api
Pipfile && Pipfile.lock now have MonkeyType in the dev-dependencies

Awesome Code Talk

Carl Meyer - Type-checked Python in the real world - PyCon 2018

How did I do this?

Installing MonkeyType

$ pipenv install MonkeyType --dev
$ pipenv shell

Running MonkeyType, which generates a monkeytype.sqlite3

$ monkeytype --verbose run nimbus.py

Then I asked 2 simple questions "What is Foaad's email?" and "Who is the contact for Color Coded?"

Checking for what modules MonkeyType explored while the nimbus.py code was running

$ monkeytype list-modules
QA
database_wrapper
nimbus_nlp.NIMBUS_NLP
nimbus_nlp.question_classifier
nimbus_nlp.save_and_load_model

Telling MonkeyType to generate type annotations

$ monkeytype --verbose stub QA
$ monkeytype --verbose stub database_wrapper
$ monkeytype --verbose stub nimbus_nlp.NIMBUS_NLP
$ monkeytype --verbose stub nimbus_nlp.question_classifier
$ monkeytype --verbose stub nimbus_nlp.save_and_load_model

Telling MonkeyType to edit our code and apply the type annotations

$ monkeytype --verbose apply QA
$ monkeytype --verbose apply database_wrapper
$ monkeytype --verbose apply nimbus_nlp.NIMBUS_NLP
$ monkeytype --verbose apply nimbus_nlp.question_classifier
$ monkeytype --verbose apply nimbus_nlp.save_and_load_model

Addresses #14 but does not fix

Type of change (pick-one)

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[x] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- I am unsure if the functionality is the same. Hence the [WIP] (work-in-progress) tag.
[ ] This change requires a documentation update

How Has This Been Tested?

It has not. Let's not merge this until we have tests to verify the same functionality.

**However, the auto-generated type annotations can spark discussion on the functions we have written**

For example, the monkeytype annotations of these function arguments mismatch the docstring Args in this `init`

-    def __init__(self, q_format, db_query, format_answer):
+    def __init__(self, q_format: str, db_query: partial, format_answer: partial) -> None:
         """
         Args:
             q_format (str): Question format string
@@ -49,13 +51,13 @@ class QA:
         self.db_query = db_query
         self.format_answer = format_answer

https://github.com/calpoly-csai/api/blob/41945a06af197f68719799a6676578922d192ad3/QA.py#L38-L48

MonkeyType noticed a string returned from this function when called just once. Is the return value always a string?

-    def _get_data_from_db(self, extracted_vars):
+    def _get_data_from_db(self, extracted_vars: Dict[str, str]) -> str:
         return self.db_query(extracted_vars)

https://github.com/calpoly-csai/api/blob/41945a06af197f68719799a6676578922d192ad3/QA.py#L54-L55

Ah, nice! It is reassuring to see the docstring match the generated type annotation

-def create_qa_mapping(qa_list):
+def create_qa_mapping(qa_list: List[QA]) -> Dict[str, QA]:
     """
     Creates a dictionary whose values are QA objects and keys are the question
     formats of those QA objects.
@@ -146,18 +148,18 @@ def create_qa_mapping(qa_list):
 #     return functools.partial(_single_var_string_sub, a_format)

https://github.com/calpoly-csai/api/blob/41945a06af197f68719799a6676578922d192ad3/QA.py#L71-L74

hmm.. interesting type: `DUMMY_NAME`

+from google.cloud.automl_v1.types import PredictResponse
+from monkeytype.encoding import DUMMY_NAME
+from typing import Dict

-    def inline_text_payload(self, sent):
+    def inline_text_payload(self, sent: str) -> Dict[str, DUMMY_NAME]:
         '''
         Converts the input sentence into GCP's callable format

@@ -82,7 +85,7 @@ class Variable_Extraction:

         return {'text_snippet': {'content': sent, 'mime_type': 'text/plain'} }

https://github.com/calpoly-csai/api/blob/41945a06af197f68719799a6676578922d192ad3/nimbus_nlp/NIMBUS_NLP.py#L76-L86

Ah, nice! Another return type that seems legit. Or is it? Haha, I hope so!!

+from sklearn.neighbors.classification import KNeighborsClassifier

-def load_latest_model():
+def load_latest_model() -> KNeighborsClassifier:
     # https://stackoverflow.com/a/39327156
     train_path = PROJECT_DIR + '/models/classification/*'
     list_of_files = glob.glob(train_path)

https://github.com/calpoly-csai/api/blob/41945a06af197f68719799a6676578922d192ad3/nimbus_nlp/save_and_load_model.py#L33

There's more, check out the files changed. I have not gone through everything.

Checklist (check-all-before-merge)

formatting help: - [x] means "checked' and - [ ] means "unchecked"

[ ] I documented my code according to the Google Python Style Guide
[ ] I ran ./build_docs.sh and the docs look fine
[ ] I ran ./type_check.sh and got no errors
[ ] I ran ./format.sh because it automatically cleans my code for me 😄
[ ] I ran ./lint.sh to check for what "format" missed
[ ] I added my tests to the /tests directory
[ ] I ran ./run_tests.sh and all the tests pass

mfekadu commented 4 years ago

Will do!

By the way, what’s a rebase? @Jason-Ku

I’ve only ever done “resolve conflicts” That’s the extent of my git-knowledge

snekiam commented 4 years ago

@mfekadu Here's github's page on rebase The TL;DR for this situation is that it allows you to squash a bunch of commits into one, which will make the repo cleaner, because then if there's a bug we discover in this code its all in one commit.

There's another page here with some nice ASCII art of some other reasons you might want to use a rebase. Hope this helps 😀

calpoly-csai / api