chg60 / DEPhT

Fast and sensitive discovery of complete phages and prophages in bacterial genome sequences
GNU General Public License v3.0
16 stars 3 forks source link

There is no .html file in result #37

Closed Mr-Wornock closed 2 years ago

Mr-Wornock commented 2 years ago

Dear chg60, I have a problem about the result, I have installed DEPhT by conda, and download the model to server. And then searched prophage on my data, but the result contained .gbk and .csv file only, I can't find the .html file. What is the reason of this problem?

best regards, tianqi

chg60 commented 2 years ago

Hello tianqi,

Thanks for your interest in using DEPhT, and for your inquiry!

The .html output is only made if one or more prophages are identified in the input genome, and if the -n/--no-draw flag is not present in your command. There are a number of valid reasons that this file might be missing from the output, so your answers to these questions will help me diagnose!

  1. Which model are you using, and what is the taxonomy of the genome(s) you are trying to run?
  2. Are the genome(s) in question completed assemblies (1 contig per replicon), or fragmented?
  3. What was the command you used to run DEPhT?
  4. If your genomes are not private - would you mind sharing one so I can attempt to reproduce this outcome?

Best,

Christian

Mr-Wornock commented 2 years ago

Thanks for your response, Christan! The following is my answers:

  1. I used the model -- Mycobacterium. It deserve to be mentioned that I didn't perform "depht_train" after I performed the "unzip ~/Mycobacterium.zip -d ~/.depht/models/", Do I need to train this data set?
    1. The genomes are assemblies downloaded from NCBI, some of the are contigs, others are complete genomes.
    2. I used the command "depht --model 'Mycobacterium' -c 24 -m normal -s 10 -d -v -l 5000 putative_provirus_10kb.fasta depht-result1", and I have run " conda activate depht " before that.
    3. My data are all archaeal genomes. test.txt

Best,

Tianqi

chg60 commented 2 years ago

Hi Tianqi,

Thanks for your prompt reply. Unfortunately, I think I may have bad news for you - I don't expect any of the three models in our OSF repository will be able to identify prophages in archaea. In our publication, we posit that DEPhT's success is likely tied to its genus-specific approach, as organisms in the same genus have evolved along a similar trajectory, thus frequently have similar genome architecture and gene repertoires that can be taken advantage of to improve accuracy.

Because the models we've trained are genus-specific, when used with archaea (or any other genomes too distantly related to the genus the model is named after), even if genome architecture is able to originate some amount of prophage-like signal, the other components of DEPhT's workflow will be severely compromised. The pangenome used to trim prophage regions back from obviously bacterial regions is genus-specific. The phage gene annotations are made using HMMs derived from conserved gene phamilies in phages infecting the genus in question. The blastn database used to attempt to find attB is genus-specific.

Assuming that DEPhT's workflow translates to archaea, you will likely need to train one or more models for your archaeal genera of interest. If you have the time and motivation to pursue this option, directions are available here, and I'm happy to provide further guidance as needed.