VertNet / gulo

Shredding Darwin Core Archives with ferocity, strength, and Cascalog.
7 stars 5 forks source link

non-numeric count (i.e. empty string) makes sync fail #86

Closed robinkraft closed 11 years ago

robinkraft commented 11 years ago

Example:

INSERT INTO resource (pubdate, ipt, eml, count, dwca, citation, title, icode, url, orgname, email, networks, contact, emlrights, description) VALUES ('Thu Apr 19 00:00:00 UTC 2012', true, 'http://ipt.calacademy.org:8080/ipt/eml.do?r=herp', '', 'http://ipt.calacademy.org:8080/ipt/archive.do?r=herp', '', 'CAS Herpetology (HERP)', 'CAS', 'http://ipt.calacademy.org:8080/ipt/resource.do?r=herp', 'California Academy of Sciences', 'sblum@calacademy.org', 'HerpNET', 'Stanley Blum', '', 'The electronic catalog of the herpotology collection at the California Academy of Sciences, San Francisco.')

Error:

ERROR: invalid input syntax for type double precision: ""
LINE 1: ...ttp://ipt.calacademy.org:8080/ipt/eml.do?r=herp', '', 'http:… ^
robgur commented 11 years ago

Can you just get the count straight from total number of lines in occurrence file (or n-1) when count parameter is empty in metadata? On Jul 11, 2013 8:50 PM, "Robin Kraft" notifications@github.com wrote:

Example:

INSERT INTO resource (pubdate, ipt, eml, count, dwca, citation, title, icode, url, orgname, email, networks, contact, emlrights, description) VALUES ('Thu Apr 19 00:00:00 UTC 2012', true, 'http://ipt.calacademy.org:8080/ipt/eml.do?r=herp', '', 'http://ipt.calacademy.org:8080/ipt/archive.do?r=herp', '', 'CAS Herpetology (HERP)', 'CAS', 'http://ipt.calacademy.org:8080/ipt/resource.do?r=herp', 'California Academy of Sciences', 'sblum@calacademy.org', 'HerpNET', 'Stanley Blum', '', 'The electronic catalog of the herpotology collection at the California Academy of Sciences, San Francisco.')

ERROR: invalid input syntax for type double precision: "" LINE 1: ...ttp://ipt.calacademy.org:8080/ipt/eml.do?r=herp', '', 'http:… ^

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/gulo/issues/86 .

eightysteele commented 11 years ago

this is all pre-harvest, so can't snoop line counts in this case. i like how you're thinking though.

On Thu, Jul 11, 2013 at 8:08 PM, Rob notifications@github.com wrote:

Can you just get the count straight from total number of lines in occurrence file (or n-1) when count parameter is empty in metadata? On Jul 11, 2013 8:50 PM, "Robin Kraft" notifications@github.com wrote:

Example:

INSERT INTO resource (pubdate, ipt, eml, count, dwca, citation, title, icode, url, orgname, email, networks, contact, emlrights, description) VALUES ('Thu Apr 19 00:00:00 UTC 2012', true, ' http://ipt.calacademy.org:8080/ipt/eml.do?r=herp', '', ' http://ipt.calacademy.org:8080/ipt/archive.do?r=herp', '', 'CAS Herpetology (HERP)', 'CAS', ' http://ipt.calacademy.org:8080/ipt/resource.do?r=herp', 'California Academy of Sciences', 'sblum@calacademy.org', 'HerpNET', 'Stanley Blum', '', 'The electronic catalog of the herpotology collection at the California Academy of Sciences, San Francisco.')

ERROR: invalid input syntax for type double precision: "" LINE 1: ...ttp://ipt.calacademy.org:8080/ipt/eml.do?r=herp', '', 'http:… ^

— Reply to this email directly or view it on GitHub< https://github.com/VertNet/gulo/issues/86> .

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/gulo/issues/86#issuecomment-20856047 .

robinkraft commented 11 years ago

What are the counts actually used for anyway? Anything at the moment?

eightysteele commented 11 years ago

we will surface them on publisher landing pages

On Thu, Jul 11, 2013 at 8:11 PM, Robin Kraft notifications@github.comwrote:

What are the counts actually used for anyway? Anything at the moment?

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/gulo/issues/86#issuecomment-20856093 .

robgur commented 11 years ago

Ah yes... important for us to have that count info, though, for stats carousel, right? So we do.have to find a way to do a final count maybe post-harvest? On Jul 11, 2013 9:10 PM, "Aaron Steele" notifications@github.com wrote:

this is all pre-harvest, so can't snoop line counts in this case. i like how you're thinking though.

On Thu, Jul 11, 2013 at 8:08 PM, Rob notifications@github.com wrote:

Can you just get the count straight from total number of lines in occurrence file (or n-1) when count parameter is empty in metadata? On Jul 11, 2013 8:50 PM, "Robin Kraft" notifications@github.com wrote:

Example:

INSERT INTO resource (pubdate, ipt, eml, count, dwca, citation, title, icode, url, orgname, email, networks, contact, emlrights, description) VALUES ('Thu Apr 19 00:00:00 UTC 2012', true, ' http://ipt.calacademy.org:8080/ipt/eml.do?r=herp', '', ' http://ipt.calacademy.org:8080/ipt/archive.do?r=herp', '', 'CAS Herpetology (HERP)', 'CAS', ' http://ipt.calacademy.org:8080/ipt/resource.do?r=herp', 'California Academy of Sciences', 'sblum@calacademy.org', 'HerpNET', 'Stanley Blum', '', 'The electronic catalog of the herpotology collection at the California Academy of Sciences, San Francisco.')

ERROR: invalid input syntax for type double precision: "" LINE 1: ...ttp://ipt.calacademy.org:8080/ipt/eml.do?r=herp', '', 'http:… ^

— Reply to this email directly or view it on GitHub< https://github.com/VertNet/gulo/issues/86> .

— Reply to this email directly or view it on GitHub< https://github.com/VertNet/gulo/issues/86#issuecomment-20856047> .

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/gulo/issues/86#issuecomment-20856067 .

eightysteele commented 11 years ago

stats are generated via mapreduce post-harvest, so not a prob yep. these counts here are just for surfacing in ui.

On Thu, Jul 11, 2013 at 8:14 PM, Rob notifications@github.com wrote:

Ah yes... important for us to have that count info, though, for stats carousel, right? So we do.have to find a way to do a final count maybe post-harvest? On Jul 11, 2013 9:10 PM, "Aaron Steele" notifications@github.com wrote:

this is all pre-harvest, so can't snoop line counts in this case. i like how you're thinking though.

On Thu, Jul 11, 2013 at 8:08 PM, Rob notifications@github.com wrote:

Can you just get the count straight from total number of lines in occurrence file (or n-1) when count parameter is empty in metadata? On Jul 11, 2013 8:50 PM, "Robin Kraft" notifications@github.com wrote:

Example:

INSERT INTO resource (pubdate, ipt, eml, count, dwca, citation, title, icode, url, orgname, email, networks, contact, emlrights, description) VALUES ('Thu Apr 19 00:00:00 UTC 2012', true, ' http://ipt.calacademy.org:8080/ipt/eml.do?r=herp', '', ' http://ipt.calacademy.org:8080/ipt/archive.do?r=herp', '', 'CAS Herpetology (HERP)', 'CAS', ' http://ipt.calacademy.org:8080/ipt/resource.do?r=herp', 'California Academy of Sciences', 'sblum@calacademy.org', 'HerpNET', 'Stanley Blum', '', 'The electronic catalog of the herpotology collection at the California Academy of Sciences, San Francisco.')

ERROR: invalid input syntax for type double precision: "" LINE 1: ...ttp://ipt.calacademy.org:8080/ipt/eml.do?r=herp', '', 'http:… ^

— Reply to this email directly or view it on GitHub< https://github.com/VertNet/gulo/issues/86> .

— Reply to this email directly or view it on GitHub< https://github.com/VertNet/gulo/issues/86#issuecomment-20856047> .

— Reply to this email directly or view it on GitHub< https://github.com/VertNet/gulo/issues/86#issuecomment-20856067> .

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/gulo/issues/86#issuecomment-20856169 .