barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
137 stars 21 forks source link

Generating new consensus sequence with variants #360

Closed james-weger closed 8 months ago

james-weger commented 8 months ago

Hello,

Thanks for creating this useful tool. I'd like to call variants in a mixed viral population and have breseq apply the variants above a certain threshold to a new .gbk or .fasta file. Is this possible with breseq currently?

thanks in advance

jeffreybarrick commented 8 months ago

Sure, maybe with a little intervention.

If you run breseq in --polymorphism mode, you will have an output.gd file in the output directory.

You can then use the gdtools FILTER command to filter it to just the mutations you want to keep. The command would be something along the lines of this (but change the 0.8 to whatever your threshold would be).

gdtools FILTER -c frequency>=0.8 -o filtered.gd output.gd

Here's where some manual intervention is needed. You will need to edit filtered.gd and either remove the parts of the lines that are frequency=XXX or set them to be frequency=1 for the next step to work. (You are converting the polymorphic calls to consensus ones.) A search and replace should be able to get the job done.

Then, you can use gdtools APPLY to generate the updated genome

gdtools APPLY -o updated.gbk -r reference.gbk filtered.gd

Use the command line --help for these commands to see the parameters. I didn't test what I wrote, but it should be able to help you make any needed correction.

james-weger commented 8 months ago

Thank you! I just had to update the FILTER step to

gdtools FILTER -c "frequency <= 0.5" -o filtered.gd output.gd

and it worked. I appreciate your help!

Humbly, James -- James Weger-Lucarelli, PhD Assistant Professor Dept. of Biomedical Sciences and Pathobiology College of Veterinary Medicine Virginia Tech Fralin Life Sciences Building 3 https://maps.google.com/?q=1981+Kraft+Drive,+Room+2033+Blacksburg,+VA+24061&entry=gmail&source=g60 W Campus Dr. Blacksburg, VA 24061 https://maps.google.com/?q=1981+Kraft+Drive,+Room+2033+Blacksburg,+VA+24061&entry=gmail&source=g (540) 231-6594

https https://www.thewegerlucarellilab.com/://w https://www.thewegerlucarellilab.com/ww.thewegerlucarellilab.com https://www.thewegerlucarellilab.com// https://www.thewegerlucarellilab.com/

On Fri, Oct 13, 2023 at 9:09 AM Jeffrey Barrick @.***> wrote:

Sure, maybe with a little intervention.

If you run breseq in --polymorphism mode, you will have an output.gd file in the output directory.

You can then use the gdtools FILTER command to filter it to just the mutations you want to keep. The command would be something along the lines of this (but change the 0.8 to whatever your threshold would be).

gdtools FILTER -c frequency>=0.8 -o filtered.gd output.gd

Here's where some manual intervention is needed. You will need to edit filtered.gd and either remove the parts of the lines that are frequency=XXX or set them to be frequency=1 for the next step to work. (You are converting the polymorphic calls to consensus ones.) A search and replace should be able to get the job done.

Then, you can use gdtools APPLY to generate the updated genome

gdtools APPLY -o updated.gbk -r reference.gbk filtered.gd

Use the command line --help for these commands to see the parameters. I didn't test what I wrote, but it should be able to help you make any needed correction.

— Reply to this email directly, view it on GitHub https://github.com/barricklab/breseq/issues/360#issuecomment-1761495394, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM4R7JHNPZ2HMKL5QIV7HJTX7E4RLAVCNFSM6AAAAAA54EPZLKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRRGQ4TKMZZGQ . You are receiving this because you authored the thread.Message ID: @.***>