SocialComplexityLab / life2vec

MIT License
487 stars 68 forks source link

Data format #6

Closed hajerkr closed 3 months ago

hajerkr commented 5 months ago

Hello,

thanks for making your work open source it's much appreciated. I'm a researcher from Imperial College London and I found your tool to be potentially helpful for a project. The following points are not entirely clear to me, so it would be good if I could get some guidance:

  1. In which format do the life events need to be sorted in? json? Is there a naming convention to follow? A Mine are currently in a csv with roughly the following headers: subj_id, event, date, comments
  2. How do I indicate my target value for prediction? If it's a diagnosis for instance.
  3. I see specific .py files under data_new/sources, like education, labour, health. Are those used for partitioning the dataset? If they don't apply to my dataset, would I generate other files?

I am happy to hop on a call as well if this is easier for you, as I may have more questions! Thanks for any clarification you're able to provide.

carlomarxdk commented 3 months ago

Hi there,

I'm sorry for not getting back to you sooner!

  1. I suggest checking the life2vec-light version (the one with the dummy data) to see how the post-processed data structure looks like :)
  2. I will also update the life2vec-light version to include the prediction tasks.

We can also have a call if you still have questions. Reach me out on germans@savcisens.com