citp / fertility-prediction-challenge-2024

Fertility prediction challenge
MIT License
0 stars 1 forks source link

How to handle features about expectations for having kids? #17

Open emilycantrell opened 2 months ago

emilycantrell commented 2 months ago

Question

@HanzhangRen How did you choose the number 31 in "If no expected kids, then a lower-bound estimate for the number of years within which to have kids is 31?"

Proposed future steps (for June 3 submission)

I'd like to make use of the version of cf20m128, cf20m129, and cf20m130 from earlier waves, since many people have missing data in 2020 but might have answered these questions earlier. Here are some possible things to do. I'll explore these options before the June 3 submission deadline.

Do you think you will have (more) kids in the future?

How many (more) children do you think you will have in the future?

Within how many years do you expect to have your first/next child?

I was initially thought we should definitely combine answer to these questions across waves. But after thinking through all the problems mentioned above, I'm less certain.

Even if we don't combine data across waves, I do still think it's worth including earlier versions of these features (going at least a couple of years back, or maybe all the way back to 2008) in addition to the 2020 version of these features. That will hopefully help for people who had missing values in 2020.

One other thought: In 2020, people's plans for having children (or when to have children) might have changed due to the pandemic. I think the survey was in Sept/Oct of 2020. So that is an extra reason to think that answers from prior years might not be translatable to 2020 answers.

HanzhangRen commented 2 months ago

@HanzhangRen How did you choose the number 31 in "If no expected kids, then a lower-bound estimate for the number of years within which to have kids is 31?"

30 is the maximum response that is non-missing, so I went one year beyond that and picked 31. I didn't want to mean impute the number of years within which to have kids for those people who do not plan to have kids, as that would make them appear much more eager to have kids than they really are.

emilycantrell commented 2 months ago

Combine the features above to calculate the difference between expected number of kids and actual number of kids?

Before working on this, examine how consistent answers are from 2019 to 2020 (both within people, and in overall rates)

emilycantrell commented 2 months ago

I got started on code that combines 2020 answers with 2019 answers for all three fertility intentions questions. It's not perfect, but it's at least a good starting place. I'll post details tomorrow, and results for #21

emilycantrell commented 1 month ago

The feature engineering that I did made almost no difference, so I don't think we should do any additional feature engineering on fertility intention features. I recorded the results here.

emilycantrell commented 1 month ago

I previously said the feature engineering wasn't worth pursuing because it made "almost no difference." However, after seeing that other changes which made "almost no difference" individually seemed to add up to put us in first place, I now think the feature engineering is worth testing a bit more, so I'm reopening the issue.