MapofLife / MOL

Integrating information about species distributions in an effort to support global understanding of the world's biodiversity.
http://mol.org
BSD 3-Clause "New" or "Revised" License
26 stars 2 forks source link

Rangemap shapefiles missing "Seasonality" #89

Closed gaurav closed 12 years ago

gaurav commented 12 years ago

Rob just pointed out that of the 650,627 polygons uploaded, only 60,215 have a valid seasonality. At first I thought that this might be because of an incorrect config.yaml, but I've found several rangemap shapefiles (anas_platalea.shp and anas_undulata.shp) which don't have any seasonality information at all.

There's three ways we can proceed from here:

  1. Set all blank seasonalities to a particular value (1 or 0), or modify the rendering to colour blank seasonalities in a particular way.
  2. If @walterj and Co have a mapping from filename to seasonality, I can write a script to update the existing shapes with that seasonality.
  3. If we can regenerate the rangemaps (or if the seasonality is really in there somewhere!), I can reupload those shapefiles.

What does everybody think? The rangemaps are available on mol.colorado.edu at /home/gaurav/data/mol-data/range; Cody has access.

eightysteele commented 12 years ago

A seasonality default doesn't seem to make sense to me. I'd go with a default styling that indicates an unknown seasonality. We need to hear from @walterj because if he has a text file that maps scientific name to seasonality, we can upload that as a new temp table to CDB then do an SQL update from that to the polygons table.

robgur commented 12 years ago

Like Aaron's suggestion but its a bit more complicated in that we can't just match scientific name to seasonality, can we? We need to know which polygon reflects breeding range versus non-breeding, so we need to know which polygons are which, as well as which species are which, yes?

eightysteele commented 12 years ago

Totally. My point is just that if @walterj can provide enough information in a text file to assign seasonalities to polygons, then we are set.

walterj commented 12 years ago

We went over some of this when we discussed metadata and the data upload/entry forms (see those documents). Not all sources will use the exact name "Seasonality" for a field that essentially gives seasonality values. IUCN maps do have the field. In the Jetz bird maps (because we still work with the old version these make up many polygons) this field is called "OccCode" (categories have very similar meaning though). In this case we should simply treat this field as "Seasonality". Other sources (e.g. checklists) may lack the field altogether. For these I agree that we should simply assign '5' (Seasonal Occurrence Uncertain – The species is/was present, but it is not known if it is present during part or all of the year.). It will be good to be able to replace some of these values later (e.g. for species that are not very mobile, we may change 5 to 1, i.e. resident. But we can keep that for later.

eightysteele commented 12 years ago

Not all sources will use the exact name "Seasonality" for a field that essentially gives seasonality values.

Totally. This is where our schema configuration file comes into play. We can do the mapping there.

Other sources (e.g. checklists) may lack the field altogether. For these I agree that we should simply assign '5' (Seasonal Occurrence Uncertain – The species is/was present, but it is not known if it is present during part or all of the year.).

OK, makes sense. We'll default to 5 in this case.

It will be good to be able to replace some of these values later

Definitely, and we'll use the CDB SQL API for this.

robgur commented 12 years ago

My point is that we need these data populated properly into the database if we plan to show off styling in the demo. Those data are NOT in CartoDB now. So like the thinking, and need to understand timing and priority.

eightysteele commented 12 years ago

@robgur - Yeah man, we need a short turn solution for this. I guess we can just style based on the above rules, and anything without seasonality will get the value 5. Can we do better quickly or no?

robgur commented 12 years ago

Yes but it won't really help much because we have no data (5) or 1 (year long resident) and nothing else so no real reason to style.

eightysteele commented 12 years ago

I see, so of the 650,627 polygons uploaded, only 60,215 have a valid seasonality, and that seasonality is 1. Right?

robgur commented 12 years ago

Yup

eightysteele commented 12 years ago

OK, so what we can do right now is setup the styling rules, and just understand that we'll have 2 colors (1, 5) until we update the polygons table.

gaurav commented 12 years ago

Three quick points:

  1. It'd be quite easy to update the polygons table to set all NULL Seasonality to '5' for now, if that'll make the UI code easier/more stable.
  2. I haven't had a chance to check for OccCodes in the Seasonality-less rangemaps yet (homework!), but I'll do that tonight and report back. If it does exist, I can poke around a bit and see if there's some way we can upload them separately (based on the hashes we're using to check for duplicates at this point).
  3. In the future, we could make Seasonality a compulsory variable, although I'm not sure that's a good idea (what about points data for instance).

Okay, back to Biostats for me, but I should be back online later tonight and working through the weekend.

walterj commented 12 years ago

Oh, we do have 2,3,4 for all IUCN and Jetz maps, as long as either Seasonality or OccCode field weren't dropped from dbf during ingest.

Walter

-----Original Message----- From: Rob [mailto:reply@reply.github.com] Sent: Friday, February 17, 2012 12:34 PM To: Jetz, Walter Subject: Re: [MOL] Rangemap shapefiles missing "Seasonality" (#89)

Yes but it won't really help much because we have no data (5) or 1 (year long resident) and nothing else so no real reason to style.


Reply to this email directly or view it on GitHub: https://github.com/MapofLife/MOL/issues/89#issuecomment-4024201

walterj commented 12 years ago

Confused by this. Our expert range maps have a populated Seasonality or OccCode field (just look at the dbf) that we need to ingest (see metadata/ingest discussions in summer 2011) and then simply use.

Walter

-----Original Message----- From: Gaurav Vaidya [mailto:reply@reply.github.com] Sent: Friday, February 17, 2012 1:27 PM To: Jetz, Walter Subject: Re: [MOL] Rangemap shapefiles missing "Seasonality" (#89)

Three quick points:

  1. It'd be quite easy to update the polygons table to set all NULL Seasonality to '5' for now, if that'll make the UI code easier/more stable.
  2. I haven't had a chance to check for OccCodes in the Seasonality-less rangemaps yet (homework!), but I'll do that tonight and report back. If it does exist, I can poke around a bit and see if there's some way we can upload them separately (based on the hashes we're using to check for duplicates at this point).
  3. In the future, we could make Seasonality a compulsory variable, although I'm not sure that's a good idea (what about points data for instance).

Okay, back to Biostats for me, but I should be back online later tonight and working through the weekend.


Reply to this email directly or view it on GitHub: https://github.com/MapofLife/MOL/issues/89#issuecomment-4025175

robgur commented 12 years ago

Unfortunately, I do not see OccCode in our current polygon dataset populated into CartoDB.

eightysteele commented 12 years ago

Guys, let's pull this convo into a call and sort this out. I'll shoot an email now.

gaurav commented 12 years ago

On 17 February 2012 11:38, Rob reply@reply.github.com wrote:

Unfortunately, I do not see OccCode in our current polygon dataset populated into CartoDB. Yup, this is the "bug": I only used the 'Seasonality' field, and not the 'OccCode' field, while writing config.yaml. Sorry about that. I need to double check that the DBF files I have have the OccCode field, and then we're good to go with either reuploading mol_rangemaps or coming up with some hacky SQL way of uploading just the missing OccCodes.

robgur commented 12 years ago

Lets just reupload. If we do this, can we also maybe use the opportunity to check for duplicates and see if that problem can be tracked down? Capacity?

eightysteele commented 12 years ago

@gaurav - Quick sanity check. How big is the polygons CSV file?

gaurav commented 12 years ago

@eightysteele: There is no polygons CSV file. There are 13310 shapefiles, with (allegedly) 640k polygons in them (so the extra 10k polygons in the 'polygons' table is duplication or something else).

@robgur It'll take a while to track down why the duplication is happening (especially since I'm feeling sick enough today that I'm not going to be very productive, unfortunately). My next tasks are to delete the duplicates and keep GBIF uploads running, so I'll definitely be keeping an eye open for anything pointing to the duplication bug while doing that. If I don't spot it, I can take that on as my main task once GBIF and rangemaps are sorted.

@walterj I'm finally able to confirm that yes, the files do have OccCodes, which weren't loaded in the last upload. I like Rob's idea of just rerunning the upload, maybe after we've sorted out the duplication issue.

eightysteele commented 12 years ago

@gaurav - Once that timeout thing is sorted on CDB, we'll be able to have loader.py output the polygons as a CSV file and then just upload it to CDB. Boom. We'll be able to add to polygons by uploading new CSV files to the same table. We'll be able to update the table using SQL (or via script if we need to automate down the road).

I think if we can live with the duplicates for the release, especially if you can manually remove as many duplicates as possible without losing too much sanity.

gaurav commented 12 years ago

CSV file or GeoJSON file? Because a GeoJSON upload mechanism with a file size limit would rock.

Agreed on the duplicates front. So for Seasonality, let's wait for a bit. With the new bird files to be uploaded soon, it probably makes sense to get the Seasonality on that right to start off with, and then we can reupload mol_rangemaps (minus bird data) with the fixed Seasonality.

eightysteele commented 12 years ago

GeoJSON for sure. Agree with getting the new simplified bird range maps in first.

gaurav commented 12 years ago

As of @3860ad2238412, OccCode has been added to rangemaps at least. I'll monitor the next rangemaps upload and make sure they're getting through.

eightysteele commented 12 years ago

Definitely coordinate with @jmalczyk on range map uploads.

eightysteele commented 12 years ago

Range maps have seasonality now, closing issue.