Closed dschwilk closed 9 years ago
Ok, so although it is best practice to flush the buffer, this was not the problem. The reduced number of occurrences in my most recent extraction was because my code that does not bother outputting any records that were a synonym match that could not be reverse-matched back to tankname. This avoids pulling records we could never match back to phylogeny. Certain combinations of names and synonyms in The Plant List only allow one-directional synsonym matching in our sister-synonym algorithm. This is discussed elsewhere. This is ok and unavoidable. But ealier version of my extraction script spit these out anyway. My current version avoids us wasting cleaning effort on these. Closing.
Looks like a buffer flush problem. This looks like an interaction of my code assuming flush hapens on close and a change in the python interpreter or file i/o. See email exchange, Dan McGlinn and Dylan Schwilk. Resulted in a million missing matches when I tried to rerun the extraction. I have a fix in mind.