OpenTreeOfLife / treemachine

Source tree graph database
Other
16 stars 6 forks source link

LSIntersection appears to be called alot and is not efficient #175

Closed mtholder closed 9 years ago

mtholder commented 9 years ago

https://github.com/OpenTreeOfLife/treemachine/blob/rootward-synth/src/main/java/org/opentree/tag/treeimport/BipartOracle.java#L856-L865

Set intersections should not be O(MN).

chinchliff commented 9 years ago

Agreed, should be an iteration over the elements of one set to check if they are in the other, so O(min(M,N)), correct?

I think there is another problem here: the use of the '==' operator could be doing some weird stuff. That should only return true when the two elements being compared are pointers to the same object in memory, which does not seem like what we want here. I wonder if that is affecting the unsupported nodes?

blackrim commented 9 years ago

Agreed on the first part. The second would be surprising considering the verbose output we had in testing this and the number of contrived tests that require a functioning version of this.

chinchliff commented 9 years ago

Maybe for some reason they always end up being the same object anyway, so it works as expected? But that is the behavior of the == operator as far as I know. On Wed, May 6, 2015 at 9:26 AM Stephen Smith notifications@github.com wrote:

Agreed on the first part. The second would be surprising considering the verbose output we had in testing this and the number of contrived tests that require a functioning version of this.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/175#issuecomment-99458216 .

chinchliff commented 9 years ago

Should be fixed in c17c70257a9a5e67d0febe3956ea7f18061ff868

chinchliff commented 9 years ago

Also, no longer uses an equality comparison so the potential issue there is now moot. Likely that bit was not causing problems anyway because (1) if it were, we should have seen a lot of failures so (2) it is likely that the Long objects being compared were in fact references to the same objects in memory.