NMRL / Ardetype

Pipeline allows to perform reference-guided de-novo assembly of bacterial genomes, starting from Illumina PE150 fastq files.
1 stars 0 forks source link

Sample identifiers #36

Open omegatro opened 8 months ago

omegatro commented 8 months ago

Note by @vangravs in #33: seq_batch should have ideally 3 timestamp columns

uniqueness should be determined by sample_idl, analysis_batch_id and analysis start datetime. It can also just remain sample_idl, analysis_batch_id if analysis_batch_id is supplemented with timestamp, and the folders that lack the timestamp are populated with dummy dates (1900-01-01) or retrieved and entered manually for the retrospective samples.

Refers to update_seq_batches.py

omegatro commented 8 months ago

Currently timestamps are being trimmed by aggregations scripts https://github.com/NMRL/Ardetype/blob/bcc3b263efa5faea0f9e7515f870615ac7b3f81a/subscripts/downstream/update_utilities.py#L112C1-L114C79. Need access to airflow to test further.

omegatro commented 8 months ago

seq_batch should have ideally 3 timestamp

omegatro commented 8 months ago
omegatro commented 8 months ago

Date format convensions files: yyyymmdd_hhmmss db timestamp: ISO 8601 custom: yyyy-mm-dd custom2:yyyy-mm-dd_hh-mm-ss

omegatro commented 8 months ago
omegatro commented 8 months ago

add seq_batch_timestamps from map

omegatro commented 7 months ago

Normalize NULL as 'None' str in the database and upload