BenChehade / datasciences

attempt at data science competitions - mostly kaggle
MIT License
1 stars 0 forks source link

General Discussion #1

Open shutUpAndCode opened 7 years ago

shutUpAndCode commented 7 years ago

This seems like a fairly easy place to have a general discussion.

So before we do any analysis of data, we need to analyse some data.

The following are the competitions currently on Kaggle:

Zillow - house price prediction

Intel - Cervical Cancer screening image processing

Google & youtube - video processing

Planet - image processing

Instacart - predicting future buying habits

Mercedes - regression from binary variables

Sberbank - predicting price

NOAA - image processing

Quora - NLP

So this means out of the competitions with prizes there are

4 image processing ones 4 regression ones 1 NLP one

If the aim is to enter a competition I think we should pick regression or image processing. This comes with the normal caveat you see on investment adverts that "past performance may not be indicative of future performance" :)

I'm happy with either image processing or regression - does anyone have any strong preference?

msmoore commented 7 years ago

I think regression is a winner - less feature engineering required than an image based task.

shutUpAndCode commented 7 years ago

The possible down side I can see there is that it might be easier to have an edge in image processing if we have both technique AND feature engineering to play with, with regression we only really have technique...

msmoore commented 7 years ago

I think our priority for the first run at this should be picking a simple problem, and working through the logistics (technology, collaboration, infrastructure, etc), rather than jumping into something unfamiliar and grappling with the two problems at the same time.

Not to say that we shouldn't look at other problems, just that there's a question of sequencing.

DataMonsterBoy commented 7 years ago

Regression sounds good to me. Don't have a strong preference at this stage though. I agree with msmoore that we should focus on the simple problem and then decide. We may change our minds for some unforeseen reason about the prize money problem after doing the practise.

shutUpAndCode commented 7 years ago

ok well by the wonders of democracy it appears regression is the problem we should begin with (basically ben your vote is now meaningless, don't you love universal suffrage!).

https://www.kaggle.com/c/house-prices-advanced-regression-techniques How do we suggest we proceed? - maybe if we each take a technique, decide how we will measure success, code for a couple of weeks and see whose technique works the best?

shutUpAndCode commented 7 years ago

also - and more importantly - can we have a cooler team/repo name?

DataMonsterBoy commented 7 years ago

I'm shit with "cool" team names so I'll leave that job to someone more creative. Yeah I think it would be good if we each have a crack at doing it on our own so we sort out any teething problems to do with submitting code, making a simple ML algorithm etc. and then we can review.

shutUpAndCode commented 7 years ago

Generally my preference would be for a hilariously geeky pun, but I imagine many people do that on kaggle, so I say we just pick a word like the famous tech giants "Google", "Apple", but in preparation when we rule the world of tech it needs to be easily googlable ;)

BenChehade commented 7 years ago

So; how we deal with line endings https://help.github.com/articles/dealing-with-line-endings/

BenChehade commented 7 years ago

python coding to pep8 (which effectively ensure readability and little more. For more detailed ideas I suggest that we play it by ear as we'll all learn something from studying other people's code in detail and gain little from overly speculating at the beginning.

msmoore commented 7 years ago

Sounds good - just to clarify, we're all using Python 3, right?

shutUpAndCode commented 7 years ago

Ben suggested 3.5, I'm going to ensure I have that over the weekend, also I'm going to give this a bit of a go at the weekend I think. My plan is to try some of the basic suggestions and see where that gets me.