Update to Date README & CONTRIBUTING files

skishchampi commented 9 years ago

fixes #41 and #44

Adds few comments to 2 R scripts in ./CODE

tomschenkjr commented 9 years ago

Some ideas around the CONTRIBUTING document, would go under "Demonstrating model performance". Interested in feedback @geneorama and @skishchampi

Demonstrating Model Performance

We welcome improvements to the analytic model that creates predictions for the Department of Public Health. The city may adopt a pull request that sufficiently improves the accuracy and prediction, thus, allowing you to contribute to the inspection practice for the City. If your pull request is to improve the model, please consider the following steps when submitting a pull request.

Identify how your model is improving prior results

Run a test using the benchmark data provided in the repository

Create a pull request which describes those improvements in the description.

Work with the data science team to reproduce those results

Training your data

Train your food inspection model using data between January 2009 and 2012. Use these fits to generate a forecast of food inspections for the time period between September 2, 2014 and October 31, 2014.

Measuring improvement

The City sought to reduce the time to find critical violations. Thus, we are interested in a few key qualities in any improvements.

Your model reduces the average time to find critical violations (currently: 7.4 days)

Your model reduces the variance of the time to find critical violations (e.g., reduces the time by 7.5 days, but the standard deviation is lower)

Similarly, all restaurants were found earlier with no restaurants being found later, even if the average time remains the same

Your model increases the proportion of violations found in the first half of the pilot (e.g., percentage of critical violations found in September 2014).

The team has calculated metrics for each one of these measures. You can investigate how these measures were calculated by referring to "Forecasting Restaurants with Critical Violations". Let us know if there are other metrics that should be considered for model improvement.

Ability to adopt model

If you would like to submit an improvement, please open a pull request that notes improvements to at least one of the aforementioned benchmarks. Your code should be able to reproduce those results by the data science team.

Model improvements that include new data must use data that is freely (gratis or libre) to the City of Chicago. There must not be any terms that would prohibit the City from storing data.on local servers.

Likewise, by submitting a pull request, you agree that the City of Chicago will be allowed to use your code for analytic purposes and that your software will be licensed under the licensing found on LICENSE.md in this repository.

Thoughts?

geneorama commented 9 years ago

I think these changes are good. I wouldn't mind suggesting a few edits, but they are minor and I don't want to formally work them in right now because I think it would be complicated to track (and it's not critical).

Some notes to help me remember my thoughts:

I would change the wording a bit to make a few things sound more welcoming (IMO) in the beginning.
I would include a link to the github guide and make it clear that we want to interact with people though github and let them see this short but helpful guide to the types (and purposes) of github interactions: https://guides.github.com/activities/contributing-to-open-source/#contributing
I wouldn't say "work with the data science team to implement" but say something more like "include a reproducible example"
I might change the Training Data section to be a section on Train Data / Test Data, just to make it clear
I also would like to suggest that we offer other types of contributions, perhaps it's not just about "model performance". For example, someone who's a manager might say "we've done this type of optimization before, and we found that people got too burned out by the work flow" or maybe "this workflow let's bad actors anticipate the analysis". My main point: there could be completely "non model performance" suggestions that would be very important.

So, how about if we accept the merge and use these changes to contributing (I don't know the order) and then make other edits as needed?

geneorama commented 9 years ago

Also, I'm noticing a few little things that I'd change in the ReadMe.md, especially that the data.table mention is gone. (but I think this is fine for now and I would rather merge this issue then make edits in a separate issue / offline)

skishchampi commented 9 years ago

@geneorama The data.table related explanation is in the REQUIREMENTS section

tomschenkjr commented 9 years ago

Ok, sounds like we agree in principle, so will accept the pull request and can continue the conversation in #41 and #44 threads. Lets keep those two issues open to continue the conversation.

Chicago / food-inspections-evaluation

Update to Date README & CONTRIBUTING files #45

Demonstrating Model Performance

Training your data

Measuring improvement

Ability to adopt model