citp / fertility-prediction-challenge-2024

Fertility prediction challenge
MIT License
0 stars 1 forks source link

Create variables with the most up-to-date information on partners #25

Closed emilycantrell closed 4 months ago

emilycantrell commented 4 months ago

Questions on partners were only asked if the partner was new since prior waves, so I will combine data from across waves. (These are features in which the ego reports info on the partner.)

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

cf08a024; cf09b024; cf10c024; cf11d024; cf12e024; cf13f024; cf14g024; cf15h024; cf16i024; cf17j024; cf18k024; cf19l024; cf20m024 | 13 | 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 | 1 | Do you currently have a partner? -- | -- | -- | -- | -- cf08a025; cf09b025; cf10c025; cf11d025; cf12e025; cf13f025; cf14g025; cf15h025; cf16i025; cf17j025; cf18k025; cf19l025; cf20m025 | 13 | 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 | 1 | Do you live together with this partner? cf08a026; cf09b026; cf10c026; cf11d026; cf12e026; cf13f026; cf14g026; cf15h026; cf16i026; cf17j026; cf18k026; cf19l026; cf20m026 | 13 | 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 | 1 | What is his or her year of birth? cf08a027; cf09b027; cf10c027; cf11d027; cf12e027; cf13f027; cf14g027; cf15h027; cf16i027; cf17j027; cf18k027; cf19l027; cf20m027 | 13 | 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 | 1 | In which country was your partner born? cf08a028; cf09b028; cf10c028; cf11d028; cf12e028; cf13f028; cf14g028; cf15h028; cf16i028; cf17j028; cf18k028; cf19l028; cf20m028 | 13 | 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 | 1 | In what year did the relationship with your partner begin? cf08a029; cf09b029; cf10c029; cf11d029; cf12e029; cf13f029; cf14g029; cf15h029; cf16i029; cf17j029; cf18k029; cf19l029; cf20m029 | 13 | 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 | 1 | In what year did you start living together with your partner? cf08a030; cf09b030; cf10c030; cf11d030; cf12e030; cf13f030; cf14g030; cf15h030; cf16i030; cf17j030; cf18k030; cf19l030; cf20m030 | 13 | 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 | 1 | Are you married to this partner? cf08a031; cf09b031; cf10c031; cf11d031; cf12e031; cf13f031; cf14g031; cf15h031; cf16i031; cf17j031; cf18k031; cf19l031; cf20m031 | 13 | 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 | 1 | In what year did you marry? cf08a032; cf09b032; cf10c032; cf11d032; cf12e032; cf13f032; cf14g032; cf15h032; cf16i032; cf17j032; cf18k032; cf19l032; cf20m032 | 13 | 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 | 1 | What is your partner's gender?

emilycantrell commented 4 months ago

After investigation, the only sets of variables that needed to be coalesced across waves were partner birth year and relationship start year. I implemented this in the commit above.

This change did not improve our cross-validated F1 score. In fact, it made the score slightly worse. See different runs compared here.