cognoma / machine-learning

Machine learning for Project Cognoma
Other
32 stars 47 forks source link

Revise parameter grid #114

Closed rdvelazquez closed 7 years ago

rdvelazquez commented 7 years ago

Builds on #113 and revises the parameter grid in n.mutation-classifier as follows:

This PR also added stratify=y to the test_train_split and revised the markdown note about the gene (below cell 3) to be more general as opposed to just referencing TP53.

patrick-miller commented 7 years ago

This looks good to me.

I'm not sure if parameterizing n_components by a the number of positives as opposed to the % of positives is better. This only really matters if we obtain more data, which would probably lead to other design changes as well anyway.

rdvelazquez commented 7 years ago

Thanks for reviewing this @patrick-miller!

I'm not sure if parameterizing n_components by the number of positives as opposed to the % of positives is better.

I think it's only better (or different at all) when there are queries that don't use all the samples (that are subset by disease). For example:

I think Query B should use more components than Query A because Query B will likely need more components to capture a similar amount of the variance and Query B will be less prone to over-fitting than Query A. Let me know if that made sense.

patrick-miller commented 7 years ago

I'm not positive, but I think you are right.

rdvelazquez commented 7 years ago

I'm not positive, but I think you are right

Positive... I love a good pun 😃 (I'm terrible I know)

I'll give @dhimmel a chance to look at this if he wants before we merge it.

dhimmel commented 7 years ago

@rdvelazquez or @patrick-miller someone squash merge this!