JLSteenwyk / orthosnap

a tree splitting and pruning algorithm for retrieving single-copy orthologs from gene family trees
https://jlsteenwyk.com/orthosnap/
MIT License
23 stars 1 forks source link

Single-copy orthologous genes identified is 0 #4

Closed XXboop closed 1 year ago

XXboop commented 1 year ago

Hi Jacob I want to use OrthoSNAP identify single-copy orthologous genes nested within larger multi-copy gene families. But I have a question: Single-copy orthologous genes identified is 0. following is my orthogroup tree. I think this orthgroup should have multiple subsets of single-copy orthologous genes. I'm not sure if I had a problem with my data preparation. The fasta file is in FASTA format test.faa.zip

(species0|01G00667-RA:0.5279117043,(((species4|01G04915-RA:0.3408345014,((((((((((((((((((species0|02G00068-RA:0.2833922109,species0|03G01048-RA:0.2202978172)87:0.0438419260,species0|06G05308-RA:0.0342599303)89:0.0234458576,((species0|08G00253-RA:0.0300651474,species0|01G06890-RA:0.1020280754)93:0.0297929956,species0|05G03661-RA:0.0292380589)81:0.0003640542)68:0.0148926744,(species0|10G05798-RA:0.0100306324,species0|10G05809-RA:0.0099046340)100:0.0425698431)65:0.0239536424,species0|03G03143-RA:0.0037587110)64:0.0901618950,((species0|06G04570-RA:0.2009566336,(species0|09G00242-RA:0.0276064627,species0|09G00252-RA:0.0000010000)98:0.0444388887)95:0.0272777858,species0|02G00069-RA:0.0184186553)92:0.0247560096)65:0.0684129707,(species0|03G02980-RA:0.0311118160,species0|01G01632-RA:0.0081907706)48:0.0068875751)30:0.0000022508,species0|06G06430-RA:0.0171662576)47:0.0824734516,species0|08G01583-RA:0.0582321366)74:0.0698105372,(((((((((species0|06G04577-RA:0.1029693336,species0|08G01179-RA:0.0058083040)65:0.0037224542,species0|06G06159-RA:0.0062634697)18:0.0000010000,species0|02G02446-RA:0.0031141986)9:0.0000010000,species0|02G04676-RA:0.0000010000)13:0.0000010000,species0|05G06142-RA:0.0031219956)34:0.0000023735,species0|05G00002-RA:0.0000010000)66:0.0176742730,species0|05G02195-RA:0.0000010000)67:0.0270298851,species0|07G00780-RA:0.0111600524)93:0.0810354911,species0|05G06683-RA:0.0094745401)56:0.0033010953)75:0.0139675175,species3|04G02997-RA:0.0479693279)54:0.0070020470,((((species0|UnG00061-RA:0.1019755829,((species0|05G04281-RA:0.0100712598,species0|UnG01151-RA:0.0000010000)63:0.0000010000,species0|08G00680-RA:0.0033416155)97:0.0075728491)76:0.0047777879,species0|07G05284-RA:0.0205874047)34:0.0000022133,((((species0|11G02225-RA:0.0000010000,species0|11G02226-RA:0.0034223709)100:0.0206299805,species0|UnG00947-RA:0.0101420749)53:0.0000024204,species0|05G06682-RA:0.0050443888)54:0.0000010000,species0|03G03240-RA:0.0151959726)73:0.0025157288)62:0.0224812810,species0|03G01047-RA:0.0274408894)31:0.0000444198)61:0.0209780864,(species3|09G02646-RA:0.0263452359,species0|06G04032-RA:0.0190750798)96:0.0027947135)44:0.0000024207,species0|07G03150-RA:0.0339043024)65:0.0126331495,(((((species3|03G03823-RA:0.1104765017,species3|07G00184-RA:0.0282350932)68:0.0021200555,species3|UnG04073-RA:0.0334773575)63:0.0049922998,(((species0|05G00255-RA:0.0717264427,species3|01G02394-RA:0.2180804066)93:0.0232292895,species3|07G03562-RA:0.0328082244)12:0.0000022505,species3|02G01645-RA:0.0456063875)9:0.0000024289)15:0.0000028490,species3|08G02640-RA:0.1005701721)72:0.0080842576,species0|09G00240-RA:0.0608569853)76:0.0151201690)61:0.0085221777,species2|09G00394-RA:0.1317027250)66:0.0083495796,species0|03G03342-RA:0.0000010000)67:0.0112977953,((((((species2|11G04129-RA:0.1107995970,species2|11G04120-RA:0.0000010000)74:0.0000024273,species2|11G04119-RA:0.0000010000)81:0.0000027880,(species2|11G04138-RA:0.0000022138,species2|11G04146-RA:0.0164280934)100:0.0500509920)100:0.0318182024,species2|07G02618-RA:0.0105404848)96:0.0180265075,(((species2|11G03854-RA:0.0286863546,(((species2|06G01852-RA:0.0708707538,species2|11G00501-RA:0.0364722483)87:0.0123542412,species2|11G02708-RA:0.0300289749)56:0.0000027713,species2|05G01547-RA:0.0300265612)70:0.0000024696)90:0.0257159554,species2|04G04476-RA:0.0156910819)52:0.0000026998,species2|05G02497-RA:0.0301598075)72:0.0000024693)98:0.0200837784,((((species2|10G02227-RA:0.3108728301,species2|05G03233-RA:0.0000026898)97:0.0464486734,species2|10G04334-RA:0.0489527240)91:0.0170403849,(species2|02G02902-RA:0.0602162416,(species2|06G02624-RA:0.0000010000,species2|06G02625-RA:0.0000010000)96:0.0018256185)99:0.0294009743)17:0.0000023342,((species2|08G02827-RA:0.0711202157,species2|01G04476-RA:0.0898182083)97:0.0348380375,species2|01G05041-RA:0.0233416284)49:0.0000027600)23:0.0000028939)100:0.1365387571)66:0.0777546471)61:0.1920394558,((species1|04G01317-RA:0.1160132618,species1|02G00465-RA:0.1715145619)99:0.1552966767,species1|11G02518-RA:0.1294596501)94:0.1533573617)61:0.0288696434,species1|05G02443-RA:0.3565432133)78:0.0900322084,(species3|07G04113-RA:0.4880719806,(((((species0|09G00325-RA:0.0689512854,species0|09G00326-RA:0.0829119289)100:0.3091446375,species0|05G03566-RA:0.3080354823)82:0.0870903929,(species0|08G05223-RA:0.0376953437,species0|05G03094-RA:0.0244950842)62:0.0000025151)82:0.0439946415,species0|01G06839-RA:0.0354282611)47:0.0322824766,species3|05G04758-RA:0.0950203399)36:0.0000024398)98:0.2712607015);

JLSteenwyk commented 1 year ago

Hi Xiao Xu,

After looking at the phylogeny, I don't see any instances where OrthoSNAP should be finding a subgroup of single-copy orthologous genes (or SNAP-OGs). To test the behavior of OrthoSNAP, I ran your files but changed the occupancy threshold to have at least two sequences. As expected, SNAP-OGs were identified. Based on the tree topology, are you expecting specific SNAP-OGs?

Assuming no SNAP-OGs being the correct output, there are some clear cases wherein this could happen. For example, some multi-copy gene families have unclear evolutionary histories that make highly confident SNAP-OG detection difficult. Unclear evolutionary histories may be driven by complex patterns of duplication and loss or a lack of information, making tree inference challenging.

Hope this helps!

All the best,

Jacob

XXboop commented 1 year ago

Hi! Jacob Thank you very much for your answer, which helped me a lot. I still have a few questions I would like to ask: my ultimate goal is to test whether two different multicopy orthgroups exist coevolution using PhyKIT. Therefore, I would like to use orthosnap to identify single-copy orthologous genes first. If one has many subsets of multicopy orthgroups, which subset should I choose for coevolution identification? In addition, like in my data, there are five species, but genes of only two species are identified in the subset. Can this incomplete SNAP-OGs represent its multi-copy orthgroups

best regards xiaoxu

JLSteenwyk commented 1 year ago

Hi Xiaoxu,

Thank you so much for adopting the software we have developed. Perhaps others may also be of use to you. A complete list of software our team has engineered can be viewed here: https://jlsteenwyk.com/software.html.

If you are coevolutionary calculations with five species, which is relatively few, I would recommend using the occupancy argument when running OrthoSNAP; specifically, set this parameter to five. In doing so, OrthoSNAP will only report SNAP-OGs with full taxon occupancy (N=5).

To directly address your question, if multiple SNAP-OGs are identified in a larger multi-copy gene family, use them all instead of just one. This can substantially increase the gene space examined.

Please feel free to comment with any other questions you may have. I will close the ticket since the main issue has been addressed.

Thank you again for choosing to use OrthoSNAP and PhyKIT.

best,

Jacob