ataki / hospitalfinder

Use ML to predict best hospitals for patients
1 stars 0 forks source link

Datasets do not have exactly same feature mappings starting at around 800+ #1

Closed ataki closed 10 years ago

ataki commented 10 years ago

tagging @scottcheng @petousis so you guys can see this.

Discovered this bug earlier tonight; kind of painstaking to have to fix it.

Basically, the 2009 and 2010 datasets aren't exactly the same; a few fields are missing from 2009 which are present in 2010, and this causes errors in field translations.

As an example: field index (857-859) means revenue from medicare in 2010 but is actually (856-858) in 2009.

This only happens starting at around index 800, so fields before that are still ok.

As a result, I need to correct mapping.py :frowning: However, this only applies to feature selection, and should not block you guys; just wanted to make you aware in case you were starting to do FS.

This will happen by early tomorrow afternoon.

ataki commented 10 years ago

commit 3b964d0a182d5dafd5fbb457a579a18b9f020298 fixes this

scottcheng commented 10 years ago

This is bad... We'll need to look at the docs for each year and write mappings.

ataki commented 10 years ago

Yeah, this takes A WHILE. This discourages us from using super old data, which I don't mind. I think the farthest back we should go is 4 years, and we have already 60K+ rows from 2009 / 2010 alone, which should be good enough for this milestone.