Closed lguy closed 3 years ago
Dear Lionel,
Thanks a lot for your feedback, and for your interest in our tools! Yes, this helps!
We are aware of several bugs/unexpected behaviors of MacSyFinder (including the verbosity issue and the output file inconsistency you've reported in issues #13 and issues #14), we are currently working on a new version of MacSyFinder that should address the issues you have raised. We don't have a timeline yet but expect to provide soon with a new release that fixes (at least some) the issues you've reported.
As for the special issue you report here, I will look into it. Please be aware that the unordered issue can be tricky to deal with when analyzing secretion systems whose components can be from several systems, as it cannot decipher which are part of which system for sure, without genomic context. The "ordered" mode is much more powerful, but of course, one cannot always have access to the gene order... Also, if you are particularly interested in a special T4SS type, it could worth it for you to re-design a bit the XML model files so that it fits better your needs.
Thanks again,
Best wishes,
Sophie Abby
On Wed, Nov 22, 2017 at 2:08 PM, Lionel Guy notifications@github.com wrote:
Hi, First, thanks a lot for providing and maintaining macsyfinder, it's a very useful resource! I stumbled upon what I think is a bug. I was running macsyfinder with unordered and ordered_replicon modes on the same genome. Genes that were in a referred system were not picked up (marked as absent) by macsyfinder in unordered mode, whereas they were present in the ordered one. The hmmer files were identical. I'm screening Legionella pneumophila with the pT4SSi model (TXSS). Here is some code to reproduce the issue:
################################################################################
Testing macsyfinder, db-types
################################################################################ mkdir test; cd test; mkdir fasta TXSSPATH=/usr/local/bin/macsy/Macsyfinder_models/Data/TXSS
Get test data
FASTA=GCA_000008485.1_ASM848v1_protein.faa wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/008/485/GCA_000008485.1_ASM848v1/$FASTA.gz gunzip $FASTA.gz mv $FASTA fasta/
Create a fake conf file
echo -e "[general]\nlog_level = 10\nworker_nb = 10" > log10.conf echo -e "[general]\nlog_level = 20\nworker_nb = 10" > log20.conf
Run with ordererd_replicon
for TYPE in ordered_replicon unordered_replicon unordered; do macsyfinder pT4SSi --db-type $TYPE -d $TXSSPATH/DEF -p $TXSSPATH/HMM --profile-suffix .hmm --sequence-db fasta/$FASTA --replicon-topology circular --coverage-profile 0.33 --i-evalue-select 0.00001 --out-dir $TYPE --config log20.conf done
Then the log files show for the ordered_replicon mode:
====> Decision rule for putative system pT4SSi: Mandatory genes: T4SS_virb4 1 Accessory genes: T4SS_t4cp1 1 T4SS_I_traY 0 T4SS_I_traW 0 T4SS_I_traT 0 T4SS_I_traR 0 T4SS_I_traQ 0 T4SS_I_traP 1 T4SS_I_traO 1 T4SS_I_traN 1 T4SS_I_traM 1 T4SS_I_traL 0 T4SS_I_traK 1 T4SS_I_traI 0 T4SS_I_trbA 1 T4SS_I_trbB 0 T4SS_I_traE 0 Forbidden genes: T4SS_MOBB 0
For the unordered/unordered_replicon one:
pT4SSi
Mandatory genes: T4SS_virb4 0 Accessory genes: T4SS_t4cp1 0 T4SS_I_traY 1 T4SS_I_traW 0 T4SS_I_traT 0 T4SS_I_traR 0 T4SS_I_traQ 0 T4SS_I_traP 1 T4SS_I_traO 1 T4SS_I_traN 1 T4SS_I_traM 4 T4SS_I_traL 0 T4SS_I_traK 1 T4SS_I_traI 1 T4SS_I_trbA 1 T4SS_I_trbB 0 T4SS_I_traE 0 Forbidden genes: T4SS_MOBB 0
My python skills are quite basic, but I could narrow the problem to the search_systems function in search_systems.py:
if k in systems: # SO new: get the list of forbidden genes... Then have from hits # Should better rewrite this part of the code to have a single process of the hits... #forbidden_genes = k.forbidden_genes # unused_var forbidden_hits = [] for h in hits: if h.gene.is_forbidden(k): forbidden_hits.append(h) sub_hits = list(g) + forbidden_hits so = SystemOccurence(k) #resy=so.fill_with_hits(sub_hits) # does not return anything #so.fill_with_hits(sub_hits) so.fill_with_hits(sub_hits, True) # SO new parameter to say wether forbidden genes should be included or not. _log_out.info("******************************************") _log_out.info(k.name) _log_out.info("******************************************") _log_out.info(so) systems_occurences_list.append(so)
There, it seems to me (but again, I'm a fool in python) that filling the so with sub_hits will not include the hits from the referred system (CONJ here). Using an ugly hack:
so.fill_with_hits(hits, True)
temporarily fixed the problem by including all hits in the so, but that's not (I think) a good solution. Hope that helps...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gem-pasteur/macsyfinder/issues/15, or mute the thread https://github.com/notifications/unsubscribe-auth/AFpVUvhOVxVTlvcjHZpQ7ZtvrPemvrqoks5s5BzLgaJpZM4QnXw_ .
-- Sophie ABBY
Hi, First, thanks a lot for providing and maintaining macsyfinder, it's a very useful resource! I stumbled upon what I think is a bug. I was running macsyfinder with
unordered
andordered_replicon
modes on the same genome. Genes that were in a referred system were not picked up (marked as absent) by macsyfinder in unordered mode, whereas they were present in the ordered one. The hmmer files were identical. I'm screening Legionella pneumophila with the pT4SSi model (TXSS). Here is some code to reproduce the issue:Then the log files show for the ordered_replicon mode:
For the unordered/unordered_replicon one:
My python skills are quite basic, but I could narrow the problem to the search_systems function in search_systems.py:
There, it seems to me (but again, I'm a fool in python) that filling the so with sub_hits will not include the hits from the referred system (CONJ here). Using an ugly hack:
temporarily fixed the problem by including all hits in the so, but that's not (I think) a good solution. Hope that helps...