barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
137 stars 21 forks source link

Add 'name' metadata to already complete run? #341

Closed dannagifford closed 1 year ago

dannagifford commented 1 year ago

Hello,

Super minor query, is there any way to add a 'name' to an output.gd file after the run has completed? I ran some but forgot to specify a --name during the breseq call. Now when I use gdtools ANNOTATE to compare different strains, they are all labelled output, output_1, etc.

I haven't been able to figure it out by looking at past output.gd files or in the GenomeDiff specification.

Best wishes,

Danna

danieldeatherage commented 1 year ago

Hi Danna,

Thanks for reaching out. There are a few different ways you could do this:

  1. manually edit the output.gd files to include a #=TITLE\tDESIRED_NAME line (note it is a tab (\t) not a space between the end of TITLE and the name you want to change it to).
  2. rename the output.gd file to DESIRED_NAME.gd.

Personally, I use option 2 such that I can put all the .gd files in a single directory as I find this more helpful when needing to manually curate unassigned evidence into mutations across multiple samples and making sure the correct edits are made to the correct samples. There are drawbacks to this such as now having 2 different gd files related to the same sample with different edits.

Assuming your breseq command makes use of the -o option where each sample has a unique name (ie -o Output/Sample1 ; -o Output/Sample2 ; etc), inside the Output directory, the following command will create copies of the output.gd files with the unique parts of the output flag, and put them in a new folder:

mkdir all_gds; for f in *;do cp $f/output/output.gd all_gds/$f.gd;done

dannagifford commented 1 year ago
  1. Amazing, thank you! Perhaps this could be a useful addition to the Genome Diff specification documentation?
  2. I think I've actually used this feature without realising it.
jeffreybarrick commented 1 year ago

Yes, good idea. Added this so it will be in the docs next time they are generated.