Closed skishchampi closed 9 years ago
Some ideas around the CONTRIBUTING document, would go under "Demonstrating model performance". Interested in feedback @geneorama and @skishchampi
Demonstrating Model Performance
We welcome improvements to the analytic model that creates predictions for the Department of Public Health. The city may adopt a pull request that sufficiently improves the accuracy and prediction, thus, allowing you to contribute to the inspection practice for the City. If your pull request is to improve the model, please consider the following steps when submitting a pull request.
- Identify how your model is improving prior results
- Run a test using the benchmark data provided in the repository
- Create a pull request which describes those improvements in the description.
- Work with the data science team to reproduce those results
Training your data
Train your food inspection model using data between January 2009 and 2012. Use these fits to generate a forecast of food inspections for the time period between September 2, 2014 and October 31, 2014.
Measuring improvement
The City sought to reduce the time to find critical violations. Thus, we are interested in a few key qualities in any improvements.
- Your model reduces the average time to find critical violations (currently: 7.4 days)
- Your model reduces the variance of the time to find critical violations (e.g., reduces the time by 7.5 days, but the standard deviation is lower)
- Similarly, all restaurants were found earlier with no restaurants being found later, even if the average time remains the same
Your model increases the proportion of violations found in the first half of the pilot (e.g., percentage of critical violations found in September 2014).
The team has calculated metrics for each one of these measures. You can investigate how these measures were calculated by referring to "Forecasting Restaurants with Critical Violations". Let us know if there are other metrics that should be considered for model improvement.
Ability to adopt model
If you would like to submit an improvement, please open a pull request that notes improvements to at least one of the aforementioned benchmarks. Your code should be able to reproduce those results by the data science team.
Model improvements that include new data must use data that is freely (gratis or libre) to the City of Chicago. There must not be any terms that would prohibit the City from storing data.on local servers.
Likewise, by submitting a pull request, you agree that the City of Chicago will be allowed to use your code for analytic purposes and that your software will be licensed under the licensing found on LICENSE.md in this repository.
Thoughts?
I think these changes are good. I wouldn't mind suggesting a few edits, but they are minor and I don't want to formally work them in right now because I think it would be complicated to track (and it's not critical).
Some notes to help me remember my thoughts:
So, how about if we accept the merge and use these changes to contributing (I don't know the order) and then make other edits as needed?
Also, I'm noticing a few little things that I'd change in the ReadMe.md, especially that the data.table mention is gone. (but I think this is fine for now and I would rather merge this issue then make edits in a separate issue / offline)
@geneorama The data.table related explanation is in the REQUIREMENTS section
Ok, sounds like we agree in principle, so will accept the pull request and can continue the conversation in #41 and #44 threads. Lets keep those two issues open to continue the conversation.
fixes #41 and #44
Adds few comments to 2 R scripts in ./CODE