Open shutUpAndCode opened 7 years ago
I think regression is a winner - less feature engineering required than an image based task.
The possible down side I can see there is that it might be easier to have an edge in image processing if we have both technique AND feature engineering to play with, with regression we only really have technique...
I think our priority for the first run at this should be picking a simple problem, and working through the logistics (technology, collaboration, infrastructure, etc), rather than jumping into something unfamiliar and grappling with the two problems at the same time.
Not to say that we shouldn't look at other problems, just that there's a question of sequencing.
Regression sounds good to me. Don't have a strong preference at this stage though. I agree with msmoore that we should focus on the simple problem and then decide. We may change our minds for some unforeseen reason about the prize money problem after doing the practise.
ok well by the wonders of democracy it appears regression is the problem we should begin with (basically ben your vote is now meaningless, don't you love universal suffrage!).
https://www.kaggle.com/c/house-prices-advanced-regression-techniques How do we suggest we proceed? - maybe if we each take a technique, decide how we will measure success, code for a couple of weeks and see whose technique works the best?
also - and more importantly - can we have a cooler team/repo name?
I'm shit with "cool" team names so I'll leave that job to someone more creative. Yeah I think it would be good if we each have a crack at doing it on our own so we sort out any teething problems to do with submitting code, making a simple ML algorithm etc. and then we can review.
Generally my preference would be for a hilariously geeky pun, but I imagine many people do that on kaggle, so I say we just pick a word like the famous tech giants "Google", "Apple", but in preparation when we rule the world of tech it needs to be easily googlable ;)
So; how we deal with line endings https://help.github.com/articles/dealing-with-line-endings/
python coding to pep8 (which effectively ensure readability and little more. For more detailed ideas I suggest that we play it by ear as we'll all learn something from studying other people's code in detail and gain little from overly speculating at the beginning.
Sounds good - just to clarify, we're all using Python 3, right?
Ben suggested 3.5, I'm going to ensure I have that over the weekend, also I'm going to give this a bit of a go at the weekend I think. My plan is to try some of the basic suggestions and see where that gets me.
This seems like a fairly easy place to have a general discussion.
So before we do any analysis of data, we need to analyse some data.
The following are the competitions currently on Kaggle:
Zillow - house price prediction
Intel - Cervical Cancer screening image processing
Google & youtube - video processing
Planet - image processing
Instacart - predicting future buying habits
Mercedes - regression from binary variables
Sberbank - predicting price
NOAA - image processing
Quora - NLP
So this means out of the competitions with prizes there are
4 image processing ones 4 regression ones 1 NLP one
If the aim is to enter a competition I think we should pick regression or image processing. This comes with the normal caveat you see on investment adverts that "past performance may not be indicative of future performance" :)
I'm happy with either image processing or regression - does anyone have any strong preference?