airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Does having an actual date for sequencing open the potential for subject identification? #798

Open schristley opened 4 months ago

schristley commented 4 months ago

My understanding is one technique to avoid subject identification is by replacing dates with relative times from an arbitrary t0 time point and that's what we have done from most of the AIRR standards. However I noticed sequencing_date is an actual date. Does it make sense to convert this to a relative time as well?

scharch commented 4 months ago

sequencing_date is when you turn on the NextSeq, right? That could be arbitrarily long after sampling or even after the conclusion of the study. So I'm not sure relative time would make sense...

schristley commented 4 months ago

Yeah, that's my understanding, though it should be before the conclusion of the study, if the study involves analyzing AIRR-seq then you need to sequence first... But yes, it could be arbitrary long after sampling or very soon.

I don't have a clear use case in my mind on what might happen. I was mainly wondering, if you knew the sequencing date then maybe there was the potential to backtrack and roughly determine other dates. As always you need other information if trying to do identification.

Thinking about it in another way. By having a date, you don't really know when the sequencing was done in comparison to the sampling, did the sample sit in the freezer for 20 years? Maybe that's a useful tidbit?