artic-network / civet

Cluster Investigation & Virus Epidemiology Tool
https://cov-lineages.org/resources/civet.html
GNU General Public License v3.0
48 stars 14 forks source link

background data curation feature #129

Closed aineniamh closed 3 years ago

aineniamh commented 3 years ago

create csv and fasta file from gisaid create the alignment run the background checks create the mutation file

aineniamh commented 3 years ago

Now a custom pipeline to generate civet background data- default settings emulate a gisaid style fasta download and generate an alignment, snp file and metadata file from that.

    dc_group = parser.add_argument_group('Background data curation')
    dc_group.add_argument("-bd","--generate-civet-background-data",dest="generate_civet_background_data",action="store",help="A sequence file to create background metadata, alignment and SNP file from.")
    dc_group.add_argument("--background-data-checks",dest="debug",action="store_true",help="Run checks on custom background data files, not run by default")
    dc_group.add_argument("--background-data-outdir",dest="background_data_outdir",action="store",help="Directory to output the civet background data. Default: `civet_data`")
    dc_group.add_argument("--primary-field-delimiter",dest="primary_field_delimiter",action="store",help="Primary sequence header field delimiter to create metadata file from. Default: `|`")
    dc_group.add_argument("--primary-metadata-fields",dest="primary_metadata_fields",action="store",help="Primary sequence header fields to create metadata file from. Default: `sequence_name,gisaid_id,sample_date`")
    dc_group.add_argument("--secondary-field-delimiter",dest="secondary_field_delimiter",action="store",help="Secondary sequence header field delimiter to create metadata file from. Default: `/`")
    dc_group.add_argument("--secondary-field-location",dest="secondary_field_location",action="store",help="Secondary sequence header location within primary field list. Default: `0` (i.e. the first field)")
    dc_group.add_argument("--secondary-metadata-fields",dest="secondary_metadata_fields",action="store",help="Secondary sequence header fields to create metadata file from. Default: `virus,country,sequence_id,year`")
aineniamh commented 3 years ago

To do- add in qc checks to the input fasta file- currently just aligns and makes snp file from it

aineniamh commented 3 years ago

QC checks added in, closing issue