Closed Mr0grog closed 4 years ago
I am not familiar with Python's DataSet object actually, hence dictionaries. They are also, I think?, easier to turn into Pandas dataframes, which is ultimately what happens in the notebooks.
Oh! DataSet
isn’t a Python built-in, it’s a class defined in this repo, here: https://github.com/edgi-govdata-archiving/ECHO_modules/blob/383e75c23fa8dba656590b18d177ea089d2b3889/DataSet.py#L49-L55
Pandas definitely has built-ins for turning lists and dictionaries into dataframes, but it looks like DataSet
has a few methods that create or do things with dataframes.
My concern is that a user could run CAA Violations for MA-4 then CAA Violations for MA-5. We would need an update - a new DataSet instance should be created.
Makes sense! Then I think the better approach is one that matches the example I wrote out, with a dictionary of dictionaries listing the constructor arguments for DataSet
(i.e. the “presets”). I’ll make a PR for that.
This adds a docstring for the module (really just turns the existing comment into a docstring) and one for the
make_data_sets()
function.Will add other docstrings as a separate PR. I wanted to do this one by itself so we can hash out any necessary discussion over the style (these use the Numpy style, which is a little verbose, but reads nicely in both code form and generated/parsed docs, and is what we use in all other EDGI python projects). Also, some open questions for discussion about language below.
I didn’t want to change the function or argument names, but looking at the implementation, I realized I had misunderstood what this was supposed to do based on the name and on the video tutorial I’d watched through a bit too quickly. To clarify, I described this as working with “preset configurations,” which seemed clearer to me. Happy to adjust if that doesn’t seem good to you.
In Google Colab, this displays like:
Separate but related: I didn’t make any implementation changes since I wanted to keep the scope narrowly focused on documentation here, but I noticed there’s a lot of repeated boilerplate here. I think it might be useful to rewrite this using a pattern like the following: