Wytamma / GISAIDR

Programmatically interact with the GISAID database.
69 stars 9 forks source link

Is it possible to change the sequence header format #45

Closed dawnmy closed 1 year ago

dawnmy commented 1 year ago

Thank you for developing this useful tool for dealing with GISAID data. It seems, by default export_fasta generates fasta file with sequence header <country>|<lineage>|<accession>|<submission date>. Is it possible to customize the header format, for example, <accession> <country>|<lineage>|<collection date>? I would like to use accession to select and split the fasta file later based on the accession IDs.

Wytamma commented 1 year ago

Hi @dawnmy! I’ll have to modify the export function to allow this. However you could just copy the export_fasta function and modify it yourself in the meantime.

https://github.com/Wytamma/GISAIDR/blob/master/R/export_fasta.R

Wytamma commented 1 year ago

I've added the the option to choose which columns are include in the export using the columns argument (cb8da27).

export_fasta(df, file, columns = c("accession_id", "country", "pangolin_lineage", "date"))

You could also construct your own header

df$header <- paste(df$accession_id, df$country)
export_fasta(df, file, columns = c("header")