Closed matrs closed 3 years ago
Hi Jose Luis
That's very strange, these *.tsv files don't seem to correspond at all to the SpeciesTree_rooted_node_labels.txt file. E.g.
N12.tsv contains genes from 4 species: MGYG-HGUT-04532, bin3c.184.contigs, X355_Hoffmanns_Two_toed_Sloth__metabat2_high_PE.021.contigs & GCF_001683795.1_ASM168379v1_genomic, but these species are distributed quite widely across the attached species tree.
And the same for N29.tsv.
Could you describe the steps taken in OrthoFinder to produce these? Was it just a single run from the start, what commands did you use?
All the best David
Hello David, thank you very much for your prompt answer.
the previous files come from a run which uses previous orthofinder runs (I tested a few options). To help find what the problem is, I'm attaching another related run which has this exact same problem but uses the "original run" directly. So the original run here is Jul21
, which doesn't appear to have this problem. That run was:
orthofinder -f faas -t 28 -a 8
Then, using those results I ran:
orthofinder -b Results_Jul21 -f extra_faas -M msa -y -t 28 -a 8
Which created files that have the problem (I'm attaching them with the log, Jul29
). This last run added 3 genomes and removed one, genome 36 in the log file. (the files attached in the original post come from this run but specifying a tree, -ft -s
)
For example, when looking at the N3 node in this jul29 tree:
tree.search_nodes(name='N3')[0].get_leaf_names()
[ ]: ['MGYG-HGUT-04532', 'DGYMR06203__metabat2_low_PE.047.contigs']
tree.search_nodes(name='N3')[0].get_ancestors()
[ ]: [Tree node 'N1' (0x7f07b1822cd), Tree node 'N0' (0x7f07b25387f)]
Then looking to the Ns files and MGYG-HGUT-04532
, I get N4, N7 and N11 too:
N4.tsv
['' 'GFNMCGMP_00924' 'GFNMCGMP_01074' ... 'GFNMCGMP_01376'
'GFNMCGMP_01174' 'GFNMCGMP_00331']
N5.tsv
['']
N6.tsv
['']
N7.tsv
['' 'GFNMCGMP_00924' 'GFNMCGMP_01074' ... 'GFNMCGMP_01376'
'GFNMCGMP_01174' 'GFNMCGMP_00331']
N8.tsv
['']
N9.tsv
['']
N10.tsv
['']
N11.tsv
['' 'GFNMCGMP_00924' 'GFNMCGMP_01074' ... 'GFNMCGMP_01376'
'GFNMCGMP_01174' 'GFNMCGMP_00331']
I'm attaching a few files, but in this drive folder are some of the results for both runs https://drive.google.com/drive/folders/1CELoUvE1w87FmFNXzos1__GFHpNDN_f1?usp=sharing
I hope this helps and let me know If any other file/information is needed.
Log_jul29.txt SpeciesTree_rooted_node_labels_jul29.txt Log_jul21.txt
Hi Jose Luis
This should now be fixed, you can regenerate the correct results just by running with the 'from trees' option on the final results directory which had the added species: "-ft Results_Jul29/". Thanks again for reporting this.
All the best David
Hi David, I tried the last code and It seems to work as expected. Thanks !
Hello, I'm trying to define single-copy orthogroups from the
Nx.tsv
files. i'm getting results that I consider confusing, so I wrote a couple of lines to check if a specificNx.tsv
has only genes pertaining to its descendants species, which I'm expecting. Let's say I take theN11.tsv
, I see the descendants species of this node in the species tree and I see two:Then, I loop over all the
Nx.tsv
files and I check the columnMGYG-HGUT-04532
every time. I'm expecting to get genes only in theN11.tsv
file and its ancestors:Which produces:
So
N12
,N20
andN29.tsv
show genes forMGYG-HGUT-04532
, although none of these nodes are descendants/ancestors of N11. I tried with other species and nodes, but It's always the same. Maybe I'm misunderstanding how this works and I'd appreciate any help. I'm attaching the tree file and a couple ofNx.tsv
.I'm running
orthofinder 2.5.2
Jose Luis
SpeciesTree_rooted_node_labels.txt
Ns.zip