Sample identifiers - Githubissues

omegatro commented 8 months ago

Note by @vangravs in #33: seq_batch should have ideally 3 timestamp columns

date from the input/output folder timestamp, indicating the start datetime of the
date when the output files were added to the history, indicating the end datetime of the anaysis. Probably, most convenient would be to store the date when it first appears in
date when the tag has been added/updated, if no tag present then tag date = date when the output files were added to the history

uniqueness should be determined by sample_idl, analysis_batch_id and analysis start datetime. It can also just remain sample_idl, analysis_batch_id if analysis_batch_id is supplemented with timestamp, and the folders that lack the timestamp are populated with dummy dates (1900-01-01) or retrieved and entered manually for the retrospective samples.

Refers to update_seq_batches.py

omegatro commented 8 months ago

Currently timestamps are being trimmed by aggregations scripts https://github.com/NMRL/Ardetype/blob/bcc3b263efa5faea0f9e7515f870615ac7b3f81a/subscripts/downstream/update_utilities.py#L112C1-L114C79. Need access to airflow to test further.

omegatro commented 8 months ago

seq_batch should have ideally 3 timestamp

date from the input/output folder timestamp, indicating the start datetime of the - implemented in #37 and included into database infrastructure - see this commit
date when the output files were added to the history, indicating the end datetime of the anaysis. Probably, most convenient would be to store the date when it first appears in to be added - to be done - currently partially covered by import date that is updated when record is added to the database (which upon import by fetch_microbials dag) - this column
- Also empty tag (no data) now indicates that the sample was not altered after it was first added, and the tag timestamp associated with such tag would indicate that sample was added for the first time and was not altered using CRUD operations since this is the only way to get empty tag (apart from manual editing of the history file) - see this doc
date when the tag has been added/updated, if no tag present then tag date = date when the output files were added to the history implemented in #37 and included into database infrastructure - see this commit

omegatro commented 8 months ago

Tag timestamp as history import Timestamp
- Apply ISO 8601 (timestamp to miliseconds)
Import timestamp - keep as is
- Verify that it is conserved after db reconstruction from backup
Seq_batch timestamp should be added as separate column for robustness
- keep up to seconds (NO ISO)
Column type - date - where possible

omegatro commented 8 months ago

Date format convensions files: yyyymmdd_hhmmss db timestamp: ISO 8601 custom: yyyy-mm-dd custom2:yyyy-mm-dd_hh-mm-ss

omegatro commented 8 months ago

Load tag_timestamps manually to psql
Null to default values > 'None' for empty!
seq-batch-map - to 2023-12-28_11-56-59 format upon upload
tag_timestamp - to 2024-01-16 14:02:49.539 format upon generation

omegatro commented 8 months ago

add seq_batch_timestamps from map

omegatro commented 7 months ago

Normalize NULL as 'None' str in the database and upload

NMRL / Ardetype

Sample identifiers #36