PalmBeachPost / postgeo

Geocode CSVs and jitter overlapping points
MIT License
23 stars 3 forks source link

@stucka needs to better implement geocache functionality #18

Closed stucka closed 8 years ago

stucka commented 8 years ago

Current functionality works; all of geocache.csv is read in to memory; new geocodes are saved in memory. At the end of processing, the entire file is written back to disk. It works, and it's fast.

In rewriting the file repeatedly, we're risking some corruption problems and writing more than we need to.

Alternative solution: Open geocache with a+ flag, which should allow us first to read it, then append from it. (I think. We may have to read first, close, reopen with just a.)

Then when we're adding fresh geocodes to the in-memory cache, we write them to disk. And possibly flush. That way, we shouldn't lose any data if we stop midway through the whole geocoding process, and it's not all up in the air at the very end.

Need to open file for writing or appending before starting main function. If geocache file does not exist, we need to start a file and write the header row before calling main.

We probably ought to do the read and build the cache before calling main, anyway. So maybe we need more of a geocache init function to call for main and better separate these out.

stucka commented 8 years ago

Closed by https://github.com/PalmBeachPost/postgeo/commit/724a66068bb5d8ec5124ecd30b1b0a7926b2e286