BIONF / fDOG

Feature-aware Directed OrtholoG search
GNU General Public License v3.0
9 stars 3 forks source link

About fDOG running issues #32

Closed majssssa closed 4 months ago

majssssa commented 1 year ago

I had the following problem running fDOG (hamstr) sxq020@masterv2:~/fDOG/output39_1$ fdog.run --seqFile /media/ym/desk16/sxq020/fDOG/output39_1/431143.fa --jobName A_test --refspec ORYSA@4530@230330 --corepath /media/ym/desk16/sxq020/fDOG/output39_1/A39_1_39525/coreTaxa_dir --searchpath /media/ym/desk16/sxq020/fDOG/output39_1/A39_1_39525/searchTaxa_dir --annopath /media/ym/desk16/sxq020/fDOG/output39_1/A39_1_39525/annotation_dir

Identified seed ID: 4530_1 Compiling core set for A_test Traceback (most recent call last): File "/media/ym/desk16/sxq020/.local/bin/fdog.run", line 8, in sys.exit(main()) File "/media/ym/desk16/sxq020/.local/lib/python3.8/site-packages/fdog/runSingle.py", line 225, in main core_runtime = core_fn.run_compile_core([seqFile, seqName, refspec, seed_id, reuseCore, File "/media/ym/desk16/sxq020/.local/lib/python3.8/site-packages/fdog/libs/corecompile.py", line 417, in run_compile_core compile_core([seqFile, seqName, refspec, seed_id, coreArgs, pathArgs, File "/media/ym/desk16/sxq020/.local/lib/python3.8/site-packages/fdog/libs/corecompile.py", line 178, in compile_core tree = ncbi.get_topology(tax_ids.keys(), intermediate_nodes = True) File "/media/ym/desk16/sxq020/.local/lib/python3.8/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 442, in get_topology lineage = id2lineage[sp] KeyError: 39525383

Since the species added to the taxa does not have a taxa_id number in the ncbi, I numbered it 39525383.

trvinh commented 1 year ago

Hi, as written at https://github.com/BIONF/fDOG/wiki/FAQ#can-fdog-work-with-non-ncbi-taxa, the core compilation step of fDOG cannot work with non-ncbi taxa. I suppose that you have your new species with the ID 39525383 in the blast databases folder (named coreTaxa_dir by default). Best, Vinh P.S.: I see that you open multiple issues to ask about the problems regarding only to the usage of fDOG, i.e. they have always the same topic. It would be more convenient for us if you just post them in one ONE issue entry. Many thanks!

majssssa commented 1 year ago

I am very sorry for the inconvenience caused to you. If you have any questions in the future, I will do as you said.

majssssa commented 1 year ago

I am very sorry for the inconvenience caused to you. If I have any questions in the future, I will do as you said.I'm sorry for the typo in my last answer. I hope it won't cause your misunderstanding.

majssssa commented 1 year ago

Hi, There was a strick option in an earlier version of Hamstr. I wonder if there is an option in fDOG that has similar functionality to strick.Or in other words how to make the software run more accurately.

trvinh commented 1 year ago

The -strict option is not available anymore in fDOG. I would suggest to choose from the core taxa the reference species as the most closely related species to your search taxa.

majssssa commented 1 year ago

Ok, thank you very much for your reply

majssssa commented 1 year ago

Sorry to bother you again, but the choice for the --evalHmmer parameter should be 0.00005 or 1e-6

trvinh commented 1 year ago

you must use the decimal format (e.g. 0.00005 for 5e-5)

trvinh commented 1 year ago

Hi, There was a strick option in an earlier version of Hamstr. I wonder if there is an option in fDOG that has similar functionality to strick.Or in other words how to make the software run more accurately.

Hi @majssssa , just another info, we found that fDOG performed with a high specificity. The -strict option will just reduce the sensitivity, which is not necessary. You will find this benchmark in the fDOG manuscript, which will be soon published. Regards, Vinh

majssssa commented 1 year ago

HI, When running fDOG, I added the --rbh option instead of the --rsp option. I found that only one gene was selected for each species. Is this because the conditions are more stringent after adding --rbh, so only one gene meets the criteria, or does the --rbh option select only the most perfect candidate based on the output?

ebersber commented 1 year ago

Dear Junpeng,

Thank you very much for your interest in fDOG, and we very much appreciate your feedback! When it comes to the option „rbh“, please keep the following in mind: rbh is the abbreviation for Reciprocal Best Hit. Since there is always only one ‚best‘ hit, fDOG with this option can return at most 1 ortholog per core-group and species.

From the conversation thus far, I take it that specificity of the ortholog assignment is essential for your analyses. If you want to share some further details about the project in which you use fDOG and the detected orthologs, we might be able to help you in finding the best parameter setting.

Kind regards,

Ingo

-- Prof. Dr. Ingo Ebersberger Applied Bioinformatics Group Institute of Cell Biology and Neuroscience Goethe University Max-von-Laue Str. 13 D-60438 Frankfurt Germany Phone: +49 69 798 42112 Fax: +49 69 798 42111 email: @.*** Web: http://www.bio.uni-frankfurt.de/43045195/ak-ebersberger

Am 05.05.2023 um 10:31 schrieb Junpeng Ma @.***>:

HI, When running fDOG, I added the --rbh option instead of the --rsp option. I found that only one gene was selected for each species. Is this because the conditions are more stringent after adding --rbh, so only one gene meets the criteria, or does the --rbh option select only the most perfect candidate based on the output?

— Reply to this email directly, view it on GitHub https://github.com/BIONF/fDOG/issues/32#issuecomment-1535914722, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPPIM45JQXEXNI7DLQWLN3XES3EPANCNFSM6AAAAAAWNKYDFU. You are receiving this because you are subscribed to this thread.

[ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/BIONF/fDOG/issues/32#issuecomment-1535914722", "url": "https://github.com/BIONF/fDOG/issues/32#issuecomment-1535914722", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

majssssa commented 1 year ago

Dear Ingo,

Thank you very much for your answer. My current analysis is mainly about constructing phylogenetic trees with low copy of orthogroup genes to infer phylogenetic relationships of plants, in addition to the complex hybridization and polyploidy events experienced by the plants I study, and many genes have been lost, so I want to get relatively accurate orthogroup, which is important for my later analysis.But if a reference gene only outputs one candidate gene, I don't know how many copies that reference gene has in my species and whether it fits the concept of a low-copy gene.

------------------ 原始邮件 ------------------ 发件人: "BIONF/fDOG" @.>; 发送时间: 2023年5月5日(星期五) 下午4:43 @.>; @.**@.>; 主题: Re: [BIONF/fDOG] About fDOG running issues (Issue #32)

Dear Junpeng,

Thank you very much for your interest in fDOG, and we very much appreciate your feedback! When it comes to the option „rbh“, please keep the following in mind: rbh is the abbreviation for Reciprocal Best Hit. Since there is always only one ‚best‘ hit, fDOG with this option can return at most 1 ortholog per core-group and species.

From the conversation thus far, I take it that specificity of the ortholog assignment is essential for your analyses. If you want to share some further details about the project in which you use fDOG and the detected orthologs, we might be able to help you in finding the best parameter setting.

Kind regards,

Ingo

-- Prof. Dr. Ingo Ebersberger Applied Bioinformatics Group Institute of Cell Biology and Neuroscience Goethe University Max-von-Laue Str. 13 D-60438 Frankfurt Germany Phone: +49 69 798 42112 Fax: +49 69 798 42111 email: @.*** Web: http://www.bio.uni-frankfurt.de/43045195/ak-ebersberger

> Am 05.05.2023 um 10:31 schrieb Junpeng Ma @.>: > > > HI, > When running fDOG, I added the --rbh option instead of the --rsp option. I found that only one gene was selected for each species. Is this because the conditions are more stringent after adding --rbh, so only one gene meets the criteria, or does the --rbh option select only the most perfect candidate based on the output? > > — > Reply to this email directly, view it on GitHub <https://github.com/BIONF/fDOG/issues/32#issuecomment-1535914722&gt;, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPPIM45JQXEXNI7DLQWLN3XES3EPANCNFSM6AAAAAAWNKYDFU&gt;. > You are receiving this because you are subscribed to this thread. > > [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/BIONF/fDOG/issues/32#issuecomment-1535914722", "url": "https://github.com/BIONF/fDOG/issues/32#issuecomment-1535914722", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>