geotagx / geotagx-project-template

A generic template and builder for GeoTag-X projects
GNU Affero General Public License v3.0
0 stars 5 forks source link

Refactor task.csv #5

Closed spMohanty closed 9 years ago

spMohanty commented 9 years ago

task.csv needs to be refactored to include just id, source_uri,image_url`

othieno commented 9 years ago

@spMohanty, care to explain id? I'm not quite sure where it fits in wrt. the CSV file.

spMohanty commented 9 years ago

@supranove : id is a unique identifier for each task. It is important if you want to go for an automated way to sync a large and ever changing task list.

Imagine the tasks.csv file gets updated everyday, and we have a separated script, ` whose job is to read tasks.csv and add only jobs that are not there on geotagx to the server.

othieno commented 9 years ago

I see, but wouldn't a <source URI, image URL> pair (maybe even just an image URL) be enough to distinguish different tasks?

spMohanty commented 9 years ago

@supranove : Yeah it would, but its always a good design decision to have a separate field as the primary key in any data store schema instead of leveraging from some latent uniqueness properties of other fields. Apart from that, in our case, a separate auto incrementing and integral field called say id , will make each record's "uniqueness" property equally readable for both an automated script, and a human editing the data store say in an excel sheet.

othieno commented 9 years ago

@spMohanty Ahh, I see. I thought your reasoning was that the id would be to prevent duplication by being something like a checksum (rather than an integer). I get it now :+1: