VertNet / gulo

Shredding Darwin Core Archives with ferocity, strength, and Cascalog.
7 stars 5 forks source link

get-count should handle non-Vertnet IPTs #85

Closed robinkraft closed 10 years ago

robinkraft commented 11 years ago

The get-count function scrapes the resource page to get the record count. Unfortunately, some 17 non-VertNet IPTs put the count in slightly different places on the resource page.

This one is ok: http://ipt.vertnet.org:8080/ipt/resource.do?r=tcwc_verts

These ones aren't: http://ipt.calacademy.org:8080/ipt/resource.do?r=herp http://ipt.flmnh.ufl.edu:8080/ipt/resource.do?r=mammals http://ipt.nhm.ku.edu/ipt/resource.do?r=kubi_herps

get-count should be modified to handle slight variations like these.

eightysteele commented 11 years ago

+1 On Jul 11, 2013 7:42 PM, "Robin Kraft" notifications@github.com wrote:

The get-count function scrapes the resource page to get the record count. Unfortunately, some 17 non-VertNet IPTs put the count in slightly different places on the resource page.

This one is ok: http://ipt.vertnet.org:8080/ipt/resource.do?r=tcwc_verts

These ones aren't: http://ipt.calacademy.org:8080/ipt/resource.do?r=herp http://ipt.flmnh.ufl.edu:8080/ipt/resource.do?r=mammals http://ipt.nhm.ku.edu/ipt/resource.do?r=kubi_herps

get-count should be modified to handle slight variations like these.

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/gulo/issues/85 .

robinkraft commented 11 years ago

Ok, so it's really only 10:

http://ipt.calacademy.org:8080/ipt/resource.do?r=herp
http://ipt.flmnh.ufl.edu:8080/ipt/resource.do?r=mammals
http://ipt.flmnh.ufl.edu:8080/ipt/resource.do?r=herpetology
http://ipt.nhm.ku.edu/ipt/resource.do?r=kubi_herps
http://ipt.nhm.ku.edu/ipt/resource.do?r=kubi_mammals
http://ipt.nhm.ku.edu/ipt/resource.do?r=kubi_ornithology
http://ipt.nhm.ku.edu/ipt/resource.do?r=kubi_ornithology_tissue
http://ipt.flmnh.ufl.edu:8080/ipt/resource.do?r=ichthyology
http://ipt.calacademy.org:8080/ipt/resource.do?r=ich
http://ipt.calacademy.org:8080/ipt/resource.do?r=mam
robinkraft commented 11 years ago

The workaround for now (per #87) is to insert a -1 during sync instead of an empty string.

eightysteele commented 11 years ago

getting there. are the mods easy to handle these guys?

On Thu, Jul 11, 2013 at 8:13 PM, Robin Kraft notifications@github.comwrote:

Ok, so it's really only 10:

http://ipt.calacademy.org:8080/ipt/resource.do?r=herphttp://ipt.flmnh.ufl.edu:8080/ipt/resource.do?r=mammals http://ipt.flmnh.ufl.edu:8080/ipt/resource.do?r=herpetologyhttp://ipt.nhm.ku.edu/ipt/resource.do?r=kubi_herpshttp://ipt.nhm.ku.edu/ipt/resource.do?r=kubi_mammalshttp://ipt.nhm.ku.edu/ipt/resource.do?r=kubi_ornithologyhttp://ipt.nhm.ku.edu/ipt/resource.do?r=kubi_ornithology_tissuehttp://ipt.flmnh.ufl.edu:8080/ipt/resource.do?r=ichthyologyhttp://ipt.calacademy.org:8080/ipt/resource.do?r=ichhttp://ipt.calacademy.org:8080/ipt/resource.do?r=mam

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/gulo/issues/85#issuecomment-20856146 .

tucotuco commented 10 years ago

Resource counts work for all IPTs now.