Closed loukesio closed 2 years ago
Yes, this is expected. breseq performs reference-based mapping and analysis. This means that it only maps reads to the reference sequences that you provide. If there is a "novel" insertion in a genome, such as your antibiotic resistance cassette, it doesn't know about that. The cassette's presence might reveal itself as some unassigned missing coverage at the insertion site (if some bases of the genome were deleted during the insertion). However, the reads that map there will be thrown into data/*unmatched.fastq
files. This file could be used to assemble any novel sequences that are present, but that's not the best way to get the output that you want.
For genomes that include an insertion of a cassette of some sort, you should be able to determine where the cassette was inserted by providing the sequence of the cassette itself or a plasmid that contains the cassette as an additional reference argument. Then, breseq will output unassigned junctions showing you where they are connected (based on split-read mapping). It's best to provide the sequence of the cassette as another -r
argument.
So, your command would become something like this
/data/biosoftware/breseq/breseq -j 8 -r .../REFSBW25.gbk -r .../YOURCASSETTE.gbk /.../test/12_4_c1_leaky_S1P_1.fastq /.../test/12_4_c1_leaky_S1P_2.fastq
If you want to have breseq resolve these further and make your life easier, you can also add a repeat_region
annotation that matches the range of bases in the cassette (or from the plasmid) that gets inserted by your modification procedure. Then, breseq would output a MOB type mutation with whatever the name of that cassette is. That is, it will resolve the two new unassigned junctions to one fully predicted mutation.
I've been thinking about making an example of this procedure. Maybe that would help explain things? We use it ourselves fairly often...
Dear Jeff Barrick, Thank you for the prompt reply I highly appreciate. Its great and you saved the Friday. An example of this procedure indeed would be great
thank you for your time
Dear all,
I have another question on the topic. In order to detect mutations, in a casstte, I have edited my genome in geneious, then save it and export is as a gb file. The annotations, the gff file, all looks good, but when I run Breseq I get this error
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!> FATAL ERROR <!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Error in reference file. Feature on sequence REFSBW25_barcoded has coordinates (6699684-6701516) that are both outside of the circular sequence bounds (1-6699662). FILE: reference_sequence.cpp LINE: 1764
Are you aware of how can someone edit the circular sequence bounds?
thank you for your time and I am looking forward to your reply
Any comment or direction are highly appreciated
That's odd that Geneious apparently saves a GenBank file with a feature with coordinates fully outside of the sequence bounds. If you could share the file with me by email, I can take a look to see if this is a problem with the file or possibly with breseq misinterpreting it.
A Genbank file records the length of the sequence in several places. One the first LOCUS line, as the source feature, and (implicitly) as the length of the nucleotide sequence at the end of the file. breseq will usually give you a warning if these disagree and tell you which one it is using. I don't suppose this happened before the FATAL ERROR?
You can see these in the example GenBank record here.
Dear Jeff,
Thank you so much for the reply, and for sending me the example. Indeed its really odd to me as well. I just you an email with my modified reference genome. We highly appreciate that you take the time
OK. Two observations from the file you sent.
First, in breseq v0.36.1, this input runs file with a warning because we added some stricter checks and robust fallback procedures when different sequence lengths in the GenBank file don't agree. Yay!
Second, the five offending lines in your GenBank file are these:
source 6697563..6699662
source <6697563..>6699662
source complement(<6697564..6698026)
and later in the file
source complement(6698027..6699293)
and later in the file
source complement(6699294..>6699662)
These are additional copies of what should be a special feature that occurs once per GenBank file. They've become duplicated and mangled sometime during editing in Geneious (probably from pasting together different sequences).
(Basically, there should only be one source
feature at the top of the file, and its coordinates should match the length of the entire sequence, which is 1..6724639
in this case.)
If I delete these lines, it runs fine with earlier breseq version too.
Thank you! It works perfectly!
Dear all,
I have constructed a bacterial colony with an antibiotic resistance marker which is 2000bp long. For each of my colonies I have done Sanger - Sequencing of the area to confirm its existence. Overall my sequencing went well. I have enough data and of good quality. I see this issue in multiple samples and numerous bacteria species. I do not see the insertion neither on the Unassigned missing coverage evidence nor on the Unassigned new junction evidence area.
When I do WGS and run breseq I can not detect the insertion of this area? Do you have any idea why? Am I doing something wrong with the parameters I set? Shall I change my reference and add the antibiotic resistance marker? Any help is appreciated.
this is my command btw