RyanCook94 / inphared

Providing up-to-date phage genome databases, metrics and useful input files for a number of bioinformatic pipelines.
GNU Affero General Public License v3.0
61 stars 8 forks source link

Inconsistency in database #29

Closed valentynbez closed 3 months ago

valentynbez commented 3 months ago

Hello, According to the paper, INPHARED should only include genomes producing virions:

We also assume the genomes are from phages that have been shown to produce virions and are not predictions of prophages, a requirement of submitting phage genomes.

However, I saw that inder ID MK250017 is the Lak 1 phage, which was only predicted in Devoto et al. 2019 and hasn't been isolated yet. Could you clarify this? Thanks!

RyanCook94 commented 3 months ago

Hi Valentyn,

Great question! Yes, we did originally exclude all metagenomic derived sequences (i.e. uncultured viruses) and by default we do still exclude these. However, there's a small number of ecologically important viruses which we have deliberately added such as the Lak phages.

These are thought to be phages rather than prophages and likely do exist as virions (but presumably hard to isolate with traditional techniques).

In the tsv file, I include a column which indicates the genbank designation. Phages = PHG and metagenomic sequences = ENV. If you'd like to exclude any sequences, I would do so using this column.

Hope this is helpful!

All the best, Ryan