gem-pasteur / macsyfinder

MacSyFinder - Detection of macromolecular systems in protein datasets using systems modelling and similarity search.
GNU General Public License v3.0
51 stars 17 forks source link

[BUG]GA bit thresholds unavailable on profile T6SSiii_tssO/Q. Switch to e-value threshold (-E 0.100000) #74

Closed goldenmole1 closed 2 months ago

goldenmole1 commented 4 months ago

Describe the bug I see these errors: GA bit thresholds unavailable on profile T6SSiii_tssO. Switch to e-value threshold (-E 0.100000) GA bit thresholds unavailable on profile T6SSiii_tssQ. Switch to e-value threshold (-E 0.100000)

To Reproduce Steps to reproduce the behavior:

  1. The exact command lines you use '...'

    !/bin/bash

SBATCH --mem=500G

SBATCH -c 16

source ~/anaconda3/etc/profile.d/conda.sh conda activate macsyfinder_env macsyfinder --e-value-search 0.1 --sequence-db img_data_12990-26/2636416040/2636416040.genes.fna -o macsyfinder_test_2636416040 --models-dir TXSScandir/ --models TXSScan all --db-type ordered_replicon -w 16

Expected behavior I ran the exact same script with hundreds of bacterial genomes that have reported T6SSs, but MacSyFinder failed to find any T6SS genes from any of these genomes.

Screenshots Macsyfinder 2.1.3 using:

MacsyFinder is distributed under the terms of the GNU General Public License (GPLv3). See the COPYING file for details.

If you use this software please cite: Néron, Bertrand; Denise, Rémi; Coluzzi, Charles; Touchon, Marie; Rocha, Eduardo P.C.; Abby, Sophie S. MacSyFinder v2: Improved modelling and search engine to identify molecular systems in genomes. Peer Community Journal, Volume 3 (2023), article no. e28. doi : 10.24072/pcjournal.250. https://peercommunityjournal.org/articles/10.24072/pcjournal.250/ and don't forget to cite models used: macsydata cite

command used: /clusterfs/jgi/groups/science/homes/heejungcho/anaconda3/envs/macsyfinder_env/bin/macsyfinder --e-value-search 0.1 --sequence-db img_data_12990-26/2636416040/2636416040.genes.fna -o macsyfinder_test_2636416040 --models-dir TXSScandir/ --models TXSScan all --db-type ordered_replicon -w 16

models used: TXSScan-1.1.3

######################### Searching systems ########################## Models Parsing MacSyFinder's results will be stored in working_dirmacsyfinder_test_2636416040 Analysis launched on img_data_12990-26/2636416040/2636416040.genes.fna for model(s):

###################### Computing best solutions ######################

####### Writing down results in 'macsyfinder_test_2636416040' ######## No Systems found in this dataset. END

Please complete the following information):

OS:

-Linux

MacSyFinder Version:

Macsyfinder 2.1.3 using:

MacsyFinder is distributed under the terms of the GNU General Public License (GPLv3). See the COPYING file for details.

If you use this software please cite: Néron, Bertrand; Denise, Rémi; Coluzzi, Charles; Touchon, Marie; Rocha, Eduardo P.C.; Abby, Sophie S. MacSyFinder v2: Improved modelling and search engine to identify molecular systems in genomes. Peer Community Journal, Volume 3 (2023), article no. e28. doi : 10.24072/pcjournal.250. https://peercommunityjournal.org/articles/10.24072/pcjournal.250/ and don't forget to cite models used: macsydata cite

Additional context Add any other context about the problem here.

bneron commented 4 months ago

can you provide us the data you work on --sequence-db img_data_12990-26/2636416040/2636416040.genes.fna or if it's too big just the beginning of the file

bneron commented 4 months ago

The authors of the TXSScan (https://github.com/macsy-models/TXSScan) models do not provide GA-threshold for these 2 profiles, but macsyfinder have a mechanism to switch to use hmmsearch e-value in that case. What you see in the log is just a warning not an error.

goldenmole1 commented 2 months ago

Thank you so much. The file looks like this: head img_data_12990-26/2636416040/2636416040.genes.fna

2639022136 Ga0070510_0001 1..190(-)(Ga0070510_11) [Chitinophaga arvensicola DSM 3695] TTACAATGGAGAGTTTGATCCTGGCTCAGGATGAACGCTAGCGGCAGGCC TAATACATGCAAGTCGAGGGGCAGCACAGGTAGCAATACTGGGTGGCGAC CGGCAAACGGGTGCGGAACACGTACGCAACCTTCCTTCAAGCGGGGAATA GCCCAGAGAAATTTGGATTAATACCCCATAAGAATGTGGA

2639022137 Ga0070510_0002 730..1494(-)(Ga0070510_11) [Chitinophaga arvensicola DSM 3695] ATGAACAGGTACTTTATAGAAGTAGGATATAAGGGGGCGCAGTACAGCGG GTTCCAGGTACAGGAAAATGCACATTCCGTACAGGCGGAGATTGACAGGG CGCTGGGTATATTATTCCGGTCGCCCATAGAAACTACGGGATCCAGCAGA

bneron commented 2 months ago

The problem come from your input data. MacSyFinder work on proteins not on genomic data. https://macsyfinder.readthedocs.io/en/latest/user_guide/input.html#input-dataset