University of Rochester Biomedical Data Science Hackathon Summer 2020
Data are now live.
Logistics
- Registration is open until 5PM Sunday. Teams can consist of up to 4 people. Register by using the google form.
- You may add team members up
to noon EDT on 8/11 by editing your response to the google form or emailing the organizers.
- Teams of entirely undergraduates will be in the undergraduate
division, else they will be in the open division.
- Predictions on test data set are submitted by pushing to
github. A respository with the name
Hackathon-Summer-2020
,
owned by the team captain, will
be queried for a file named prediction/prediction.csv. If the team captain forks this
repository and writes predictions there everything should work
(as long as the predictions are formatted correctly).
- Predictions will be scored at least once daily, starting 8/8, with
scores posted by noon. At
the organizers' option, predictions may be scored more frequently
than this.
- General questions/problems can be directed to issues page. We encourage other hackathon participants to respond to issues.
- The scoreboard is located
here, and will be updated starting noon on 8/8.
We cannot provide support
beyond the diagnostic output included on the scoreboard if an error is
encountered in scoring your predictions.
- Competition runs through 11:59 PM EDT 12-August-2020. The predictions each team has committed to their repository at that time will be used to determine their final score.
Data
- Training data are here. These data include the labels that you need to predict
- Test data are here. Your predictions should be in the order of the
subject_id
s listed here -- no join is performed on the subject_id
column.
Data Description
These data are from a prospective multi-year clinical translational study including three cohorts of term infants experiencing their first Respiratory Syncytial Virus (RSV) season. All infants are less than or equal to nine months of age at study entry. The three subject cohorts represent the full spectrum of RSV disease severity and include a birth cohort, a cohort of infants hospitalized for RSV disease and infants evaluated at ambulatory settings for RSV infection. All infants are followed longitudinally and evaluated at recognition of acute RSV infection and twice during convalescence. Genome-wide expression is assessed in the nasal airways (nasal_gene_expr), and in sorted peripheral blood lymphocytes(cd4_gene_expr, cd8_gene_expr, and cd19_gene_expr). Additionally, the microbiome of the nasal airway was measured (nasal_microbiome). All five of these data modalities are hypothesized to contribute to the severity of the disease, measured by a Global Respiratory Severity Score (GRSS), which is the prediction target for this challenge.
Prizes
- First place in each division: $300 + $50*(team size)
- Second place: 0 + $50*(team size)
- In addition, members of teams whose submissions outperform random
guessing will be entered into a lottery for $20 grubhub
giftcards. Up to 30 will be awarded.
- Predictions will be scored based on mean square error, lower
values are better.
Final Scoreboard
Per request, a bonus round of scoring (doesn't count for final results)
Concluding remarks
Thank you to all who participated in the first GIDS biomedical data science hackathon!
If you participated, we would appreciate your feedback.
Winning teams open division
- 1st - Data Wizard - Zhijie Ji, Hanjia Lyv, Yizhi Lan, Trang Nguyen
- 2nd - Supergene - Sherif Negm, Xiaolu Wei, Lucas Hemmer, John Sproul
Winning teams undergrad division
- 1st- Random Guess -Khoa Hoang, Ha Nguyen, Tuan Pham
- 2nd- Pineapple - Xiaobo Luo, Chuqin Wu
Congratulations to the winners and all of the teams that participated and beat random guessing.
Please e-mail the organizers if your team submitted predictions and you would like a certificate of participation.