bhattlab / MGEfinder

A toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
MIT License
109 stars 16 forks source link

Request for additional information on question #30 regarding "reference" #34

Closed dbayles closed 2 years ago

dbayles commented 2 years ago

In the response to question #30 (also referenced in the response to question #33) you mention that "'--filter-clusters-inferred-assembly'"... "removes clusters that were never identified from an assembly, meaning they were only found in the reference."

Does the term "reference" you use in the question #30 response refer only to the genome defined as the reference when the pipeline was originally run? Or in this case can the term "reference" in the question #30 refer to any single-genome-only cluster (i.e. any cluster only originally identified in a single genome?) I want to be sure I'm understanding this correctly. In my analysis, I'm using only assembled genomes (albeit, most are draft assemblies) and I'm seeking clarification on whether self-only clusters (i.e. clusters only originally identified in a single genome would be removed under the same rules as would be done with the explicitly defined reference used when the pipeline was run. Under these conditions, every genome assembly in turn might be construed as a "reference" for the purposes of filtering as explained in the response to question #30.

durrantmm commented 2 years ago

Yes, "reference" refers to the genome identified as the reference when it was originally run. This is the sequence that you aligned all of your reads to in order to produce the BAM files.