andrewxhill / MOL

The Map of Life
mol.colorado.edu/
19 stars 4 forks source link

MOL-calculated fields need to be calculated #120

Open gaurav opened 12 years ago

gaurav commented 12 years ago

There are two types of MOL-calculated fields according to the field specification [1] at present: fields like "contributor" and "format" will be filled in by the publisher frontend (depending on the format of the file uploaded and the user who actually uploaded that file into our system), and fields such as "maxx", "maxy", and "medianregionsperspecies", which will probably be calculated at some point in the upload process.

I think the right thing to do here for now (i.e. until end-October) would be to leave all these fields out of the metadata database and worry about them later. "format" and "contributor" aren't really going to be useful until we have multiple formats and contributors anyway :). Afterwards, once we have a publisher frontend (even a simple one), we can change "format" and "contributor" to be required fields and have the frontend fill them in automatically, and change "maxx", "maxy", etc. to be calculated by loader.py while uploading the shapefiles into the system. If so, we should treat this as a low-priority issue for now, and maybe group it in with the Publisher Frontend development arc/label.

[1] http://www.google.com/fusiontables/DataSource?dsrcid=1326977

tucotuco commented 12 years ago

It's not entirely clear to me what you're proposing here. "Leaving fields out of the database" suggests to me that you would change the structure. Perhaps you mean that you would not populate these fields in the metadata database for now, and therefore not enforce the constraint for those that are required?

Even so, I'm confused. The requirements exist with or without a publisher front end, right?

On Wed, Oct 5, 2011 at 6:41 PM, Gaurav Vaidya reply@reply.github.com wrote:

There are two types of MOL-calculated fields according to the field specification [1] at present: fields like "contributor" and "format" will be filled in by the publisher frontend (depending on the format of the file uploaded and the user who actually uploaded that file into our system), and fields such as "maxx", "maxy", and "medianregionsperspecies", which will probably be calculated at some point in the upload process.

I think the right thing to do here for now (i.e. until end-October) would be to leave all these fields out of the metadata database and worry about them later. "format" and "contributor" aren't really going to be useful until we have multiple formats and contributors anyway :). Afterwards, once we have a publisher frontend (even a simple one), we can change "format" and "contributor" to be required fields and have the frontend fill them in automatically, and change "maxx", "maxy", etc. to be calculated by loader.py while uploading the shapefiles into the system. If so, we should treat this as a low-priority issue for now, and maybe group it in with the Publisher Frontend development arc/label.

[1] http://www.google.com/fusiontables/DataSource?dsrcid=1326977

Reply to this email directly or view it on GitHub: https://github.com/andrewxhill/MOL/issues/120

gaurav commented 12 years ago

On 5 October 2011 22:25, John Wieczorek reply@reply.github.com wrote:

It's not entirely clear to me what you're proposing here. "Leaving fields out of the database" suggests to me that you would change the structure. Perhaps you mean that you would not populate these fields in the metadata database for now, and therefore not enforce the constraint for those that are required? Yup, that's exactly what I mean! I don't think those fields are needed right now, and I want to make sure we get the workflow working smoothly for TDWG, so I'm trying not to work on anything not directly related to that :).

Even so, I'm confused. The requirements exist with or without a publisher front end, right? Hmm. format and contributor are mainly for our own records, but since we won't have the concept of a "user" submitting shapefiles to MoL until we make the publisher frontend, and since we only support a single format so far, I don't think those fields are really necessary as yet.

I'm in two minds over the calculated fields. We can just add them into the config.yaml files for now, but I really like that the config.yaml file looks essentially as it will at MoL release 1.0 (either by the publisher front end or by a desktop application that automatically packages collections for upload to MoL). That means we can start showing it to people as an example of the sort of metadata we will eventually support, and I think it's pretty and neat at the moment.

Then again, I don't know that we're actually going to start showing it to people until we develop a good format to indicate DBF-mappings (instead of my current '=FIELDNAME' hack). And if we need to show off our metadata, we can always make our Fusion Table publicly viewable and invite comments there.

Alternatively, I could try and finish MOL-calculated fields by figuring out how to calculate them; I think @eightysteele and I will be shortly moving on to making polygon import into PostgreSQL a part of loader.py, so this will probably be related to that. But I don't know how the priority on that task compares with all the other tasks we want to get working by the TDWG demo.