OpenTreeOfLife / reference-taxonomy

Open Tree Reference Taxonomy (OTT) tools
BSD 2-Clause "Simplified" License
11 stars 12 forks source link

Unify the two conflict methods #341

Open jar398 opened 7 years ago

jar398 commented 7 years ago

I'm just recording this here for the sake of institutional memory...

Smasher has two conflict methods, with different strengths and weaknesses. The code could be cleaned up, simplified, and clarified quite a bit if these two methods were merged.

I made a start on this in a branch (work-on-merge, I believe): I replaced the method in Alignment.java (the witness and antiwitness methods based on ranges) with the one in ConflictAnalysis.java (based on MRCAs). After this the OTT build was significantly slower, and other tasks were more pressing at the time, so I put this task aside. I think the way to get a single scalable, well-performing conflict method (the merger I'm talking about) should use the Alignment range method, perhaps in combination with the ConflictAnalysis MRCA method. The range approach is heuristic and sort of unprincipled, but it seems to be fast in practice, and not wrong. Alternatively, remove ConflictAnalysis (since this function, currently needed by the conflict analysis web service, is supposed to be taken over by otcetera) and clean up Alignment.

Another possibility might be to switch to the bit-mask method from otcetera. I don't know if it would be faster, and I doubt that it would be clearer, in the smasher context.