Closed nanounanue closed 4 years ago
Merging #799 into master will decrease coverage by
0.61%
. The diff coverage is66.15%
.
@@ Coverage Diff @@
## master #799 +/- ##
==========================================
- Coverage 82.47% 81.86% -0.62%
==========================================
Files 98 102 +4
Lines 6917 6980 +63
==========================================
+ Hits 5705 5714 +9
- Misses 1212 1266 +54
Impacted Files | Coverage Δ | |
---|---|---|
...e/component/catwalk/protected_groups_generators.py | 93.65% <0.00%> (-1.52%) |
:arrow_down: |
...component/postmodeling/contrast/utils/aux_funcs.py | 0.00% <0.00%> (ø) |
|
...t/results_schema/alembic/versions/4ae804cc0977_.py | 0.00% <0.00%> (ø) |
|
...t/results_schema/alembic/versions/9bbfdcf8bab0_.py | 0.00% <0.00%> (ø) |
|
.../versions/a98acf92fd48_add_nuke_triage_function.py | 0.00% <0.00%> (ø) |
|
...t/results_schema/alembic/versions/fa1760d35710_.py | 0.00% <0.00%> (ø) |
|
src/triage/tracking.py | 84.46% <ø> (ø) |
|
src/triage/cli.py | 80.43% <66.66%> (-1.24%) |
:arrow_down: |
src/triage/component/architect/utils.py | 42.50% <66.66%> (+1.47%) |
:arrow_up: |
src/triage/component/catwalk/model_trainers.py | 92.05% <66.66%> (ø) |
|
... and 13 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 21d99a0...a8f095c. Read the comment docs.
Looks pretty good overall, but see my one inline comment in predictors.py (possibly why Travis is sad as well?) and we should do a bit of testing before merging, of course.
The errors were related to the sample
problem that you mentioned and a new incompatibility between fakeredis
and rq
(already fixed, all test pass now)
@nanounanue -- take another look at the df.sample
logic I commented on, but looks good to me otherwise.
Oh boy, this PR tries to solve many issues, and I made several decisions in order to fix some of them.
Closes #791 . there was a bug that was causing evaluations to be calculated several times. This happened also with predictions and protected groups and ranking. Now it just calculated when needed.
Closes #768 . For some reason, even with information about how to break ties, the ranking was calculated after the evaluation and prediction insertion and it was with a postrgesql update. Weirdly , the predictions were queried to postgresql, then, all the ranking calculation were executed in python and then updated. This obviously gets more expensive over time. Now the ranking happens in the calculation of the predictions and it is inserted.
Closes #741. The ranking was using the method
lexsort
fromnumpy
. There is no option for handlingNone
, by default all theNone
labels were pushed to the top, independently of the tiebreaker. Now, we usepandas
methods (that actually usenumpy
, so not a bigger computation cost penalty) for doing so. Henceforth, best is we sort the score and then labels in descending order, nulls last. worst we sort score in descending order and then labels in ascendant order, nulls first.Closes #726 Closes #732 . We were getting weird behaviour if not using
random_seed
in the experiment configuration file. This behaviour was introduced whenrandom_seed
was incorporated. The reason was therandom_seed
was used in the creation of theexperiment_hash
. So every time that the experiment was executed without specify a seed, everything was calculated (except matrices). This was nonsensical. The experiment is not determined by the seed, but for their description which models? which label? which cohorts? which features?. Given that, you can explore how my results depend on the seed that I choose?. In order to do so, every time an experiment is executed the random seed used is store in theexperiment_run
. Every model now has a link to therun
that created them (therefore you can link which random seed influenced the training and evaluation of the model) If the random seed is specified in the configuration file, and if you run several times the experiment everything is executed just once. If you remove the random seed form the configuration file, you will get a experiment, and each run will get you different models, predictions and evaluations. Note that you will end with potentially many models for the same as_of_date in a specific model group. This is intended, if you get a lot of variance in your metric or your list, your model is fragile, and maybe you don't want that. Changes toAudition
needs to be done, though (see issue #800).