MapofLife / MOL

Integrating information about species distributions in an effort to support global understanding of the world's biodiversity.
http://mol.org
BSD 3-Clause "New" or "Revised" License
25 stars 2 forks source link

Make loader.py's recovery faster by doing per-file calculations #104

Closed gaurav closed 12 years ago

gaurav commented 12 years ago

At the moment, loader.py's recovery (time taken to continue a partially completed upload) is pretty slow, because it goes through each feature individually while checking it against status.sqlite3 (as per issue #48). Instead, we should download the number of features in every file (either at one shot at the start of the upload, or file-by-file as the upload proceeds), then come up with some way of figuring out the number of features in each of our shapefiles. If the feature count matches, we can skip the file without converting it into JSON or otherwise processing it in any way, saving a whole bunch of time.

eightysteele commented 12 years ago

Closing issue since we're moving away from bulkloading over the CDB SQL API.