Fireandplants / plant_gbif

This repository is for data and scripts related to plant species distribution across the globe using the Global Biodiversity Information Facility (GBIF) dataset.
4 stars 2 forks source link

FInal extracted gbif occurrences not written when they were before #15

Closed dschwilk closed 9 years ago

dschwilk commented 9 years ago

Looks like a buffer flush problem. This looks like an interaction of my code assuming flush hapens on close and a change in the python interpreter or file i/o. See email exchange, Dan McGlinn and Dylan Schwilk. Resulted in a million missing matches when I tried to rerun the extraction. I have a fix in mind.

dschwilk commented 9 years ago

Ok, so although it is best practice to flush the buffer, this was not the problem. The reduced number of occurrences in my most recent extraction was because my code that does not bother outputting any records that were a synonym match that could not be reverse-matched back to tankname. This avoids pulling records we could never match back to phylogeny. Certain combinations of names and synonyms in The Plant List only allow one-directional synsonym matching in our sister-synonym algorithm. This is discussed elsewhere. This is ok and unavoidable. But ealier version of my extraction script spit these out anyway. My current version avoids us wasting cleaning effort on these. Closing.