EHDEN / ETL-UK-Biobank

ETL UK-Biobank
https://ehden.github.io/ETL-UK-Biobank/
13 stars 4 forks source link

Death fields from baseline #356

Closed MaximMoinat closed 2 years ago

MaximMoinat commented 2 years ago

Currently, date field 53 is used, but we need to use the death register date (40000).

https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=40018 https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=40020

spiros commented 2 years ago

I think it's safe to ignore all death data in the baseline file and only use the data in the separate death table.

MaximMoinat commented 2 years ago

@spiros Thanks for your comment.

Is secondary cause of death (field 40002) also already included in the separate death tables?

spiros commented 2 years ago

Yep you can also ignore this one as well.

All death data can be sources from the death files, there's one called death.txt and another one called death_cause.txt IIRC. The first one only has the patient identifier (eid) and a date of death. The second one has the same patient identifier, and the causes of death (recorded as ICD-10 codes). There's also a separate column called level which marks primary vs. non-primary causes of death.

spiros commented 2 years ago

This is the documentation which explains it much better than myself: https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/DeathLinkage.pdf

The reason behind this confusion is that UKB reaized it's taking a long time to integrate deaths in the main "baseline" file, and researchers do not update it all that much so they decided to switch over to sending death data over in separate files (that can be downloaded directly).

MaximMoinat commented 2 years ago

Ah, so the death tables actually contain more up-to-date data (and more complete) than the baseline registry fields?

The field 40002 was highlighted as a prioritised field for this ETL iteration. So we implemented that mapping.

image
spiros commented 2 years ago

I screwed up, sorry, this is my fault. When I saw the list @vpapez sent me, I thought we were literally missing the secondary cause of death from the ETL and marked it. I did not think confirming with him on the external file mapping. Really sorry ... :(

MaximMoinat commented 2 years ago

No problem Spiros, thanks for checking this.

Is this separate death file available for everyone? i.e. could it be that some people have to rely on the death fields from baseline? This to decide whether this mapping might be useful in some other cases.

spiros commented 2 years ago

I don't think so ; we can ignore it entirely.

On Mon, Mar 28, 2022 at 11:35 AM Maxim Moinat @.***> wrote:

No problem Spiros, thanks for checking this.

Is this separate death file available for everyone? i.e. could it be that some people have to rely on the death fields from baseline? This to decide whether this mapping might be useful in some cases.

— Reply to this email directly, view it on GitHub https://github.com/EHDEN/ETL-UK-Biobank/issues/356#issuecomment-1080423396, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAFZZCEE2MSUVNOAGBPSD3VCF4ODANCNFSM5RYPT5HQ . You are receiving this because you were mentioned.Message ID: @.***>

MaximMoinat commented 2 years ago

With PR #360 we now skip the mappings deaths and death cause from the baseline fields.

Fields 40018 and 40020 are mapped using the death date.