Plasmid Seeker interpretation?

Hi!

1) The 2 FASTQ files are probably the 2 runs of the same sample (r1, r2) and should indeed be given together as an input. Regarding the filtering - depends what kind of filtering was done. If only adapter, low quality etc sequences were removed it should be fine to use this. If, on the other hand, only some specific sequences were kept (16S rRNA, plasmid ori etc), the unfiltered sequences are the way to go since PlasmidSeeker assumes the whole sequence was in the sample (even if due to lower coverage some of it was not captured).

2) The plasmid clustering is used because many of the plasmids in the database are very similar to each other meaning that they share a lot or almost all of the k-mers making it hard to distinguish which of the plasmids was in the sample. Furthermore, due to biological variance the sequenced plasmid already somewhat differs from the database sequence. The percentages of found k-mers in the cluster (by which the table is sorted) might give a clue. If the first plasmid has 100% found (might be reduced due to filtering) and the next plasmids are lower, the best guess is that the first plasmid is in the sample.

The copy number column shows the number of plasmids per bacterial genome.

Note that for a single sample, PlasmidSeeker expects that there is only 1 bacterial genome present and uses this to estimate the number of plasmids. If there are multiple bacterial genomes in the sample (with different amounts) there is no way to say which plasmid is from which bacteria and therefore we don't know which of the plasmid copy number is correct (if, for example, we have 2 different bacteria in the sample).

bioinfo-ut / PlasmidSeeker

Plasmid Seeker interpretation? #17