VertNet / gulo

Shredding Darwin Core Archives with ferocity, strength, and Cascalog.
7 stars 5 forks source link

WIP: fold sync queries into ResourceTable protocol. #43

Closed eightysteele closed 11 years ago

eightysteele commented 11 years ago

@robinkraft i folded your sweet sync queries into the ResourceTable protocol.

Now we're hitting that problem where resource maps have different keys:

gulo.ipt> (def resource-maps (vertnet-ipt-resources))
gulo.ipt> (first resource-maps)
{:dc:publisher "Jean L. Woods, Ph.D. Delaware Museum of Natural History<jwoods@delmnh.org>", :link "http://ipt.vertnet.org:8080/ipt/resource.do?r=dmnh_birds", :ipt:dwca "http://ipt.vertnet.org:8080/ipt/archive.do?r=dmnh_birds", :author "larussell@vertnet.org", :guid {:content "c21cd435-718a-4069-b503-776bf0e22b96", :isPermaLink false}, :dc:creator "Jean L. Woods, Ph.D. Delaware Museum of Natural History<jwoods@delmnh.org>", :title "DMNH Birds", :pubDate "Wed, 06 Jun 2012 13:41:11 -0500", :ipt:eml "http://ipt.vertnet.org:8080/ipt/eml.do?r=dmnh_birds", :description "The bird collection consists of approximately 67,000 study skins, 9,000 skeletons, 6,000 alcohol-preserved birds, and 36,000 clutches of eggs. The holdings represent about 4,000 bird species. About 140 taxa are in the type collection. The alcohol collection was eighth in the world in 1982 (Wood et al. 1982) and has nearly doubled in size since then, the skeleton collection was 18th in the world in 1986 (Wood & Schnell 1986), and the egg collection is second largest in North America (Kiff & Hough 1985). All skeletal and alcohol specimen data are entered into a computer database, as are all of the study skins. The egg collection data is not in a database. The collection, worldwide in scope, has especially strong collections of Philippine and Central and South American birds. Extinct species are also represented. Formation of the collection began when the Museum was founded in 1957. Among the collections that can be found here are those of George Miksch Sutton, Allan R. Phillips, Olin S. Pettingill, T.D. Burleigh, D.S. Rabor, and M. Hachisuka - D.S. Ripley. <a href=\"http://ipt.vertnet.org:8080/ipt/logo.do?r=dmnh_birds\">Resource Logo</a> <a href=\"http://ipt.vertnet.org:8080/ipt/eml.do?r=dmnh_birds\">EML</a>"}
gulo.ipt> (insert-ipt-resources "foo" resource-maps)
AssertionError Assert failed: (apply kws-match? maps)  cartodb.utils/maps->insert-sql (utils.clj:47)
robinkraft commented 11 years ago

What are the definitive map keywords? Where does :dc:publisher come from?

On Mar 2, 2013, at 7:55 PM, Aaron Steele notifications@github.com wrote:

@robinkraft i folded your sweet sync queries into the ResourceTable protocol.

Now we're hitting that problem where resource maps have different columns:

gulo.ipt> (def resource-maps (vertnet-ipt-resources)) gulo.ipt> (first resource-maps) {:dc:publisher "Jean L. Woods, Ph.D. Delaware Museum of Natural Historyjwoods@delmnh.org", :link "http://ipt.vertnet.org:8080/ipt/resource.do?r=dmnh_birds", :ipt:dwca "http://ipt.vertnet.org:8080/ipt/archive.do?r=dmnh_birds", :author "larussell@vertnet.org", :guid {:content "c21cd435-718a-4069-b503-776bf0e22b96", :isPermaLink false}, :dc:creator "Jean L. Woods, Ph.D. Delaware Museum of Natural Historyjwoods@delmnh.org", :title "DMNH Birds", :pubDate "Wed, 06 J un 2012 13:41:11 -0500", :ipt:eml "http://ipt.vertnet.org:8080/ipt/eml.do?r=dmnh_birds", :description "The bird collection consists of approximately 67,000 study skins, 9,000 skeletons, 6,000 alcohol-preserved birds, and 36,000 clutches of eggs. The holdings represent about 4,000 bird species. About 140 taxa are in the type collection. The alcohol collection was eighth in the world in 1982 (Wood et al. 1982) and has nearly doubled in size since then, the skeleton collection was 18th in the world in 1986 (Wood & Schnell 1986), and the egg collection is second largest in North America (Kiff & Hough 1985). All skeletal and alcohol specimen data are entered into a computer database, as are all of the study skins. The egg collection data is not in a database. The collection, worldwide in scope, has especially strong collections of Philippine and Central and South American birds. Extinct species are also represented. Formation of the collection began when the Museum was founded in 1957. Among the collections that can be found here are those of George Miksch Sutton, Allan R. Phillips, Olin S. Pettingill, T.D. Burleigh, D.S. Rabor, and M. Hachisuka - D.S. Ripley. <a href=\"http://ipt.vertnet.org:8080/ipt/logo.do?r=dmnh_birds\">Resource Logo <a href=\"http://ipt.vertnet.org:8080/ipt/eml.do?r=dmnh_birds\">EML"} gulo.ipt> (insert-ipt-resources "foo" resource-maps) AssertionError Assert failed: (apply kws-match? maps) cartodb.utils/maps->insert-sql (utils.clj:47) — Reply to this email directly or view it on GitHub.

eightysteele commented 11 years ago

The ipt RSS feed. When it's converted to a map, those are then keys that come out. Maybe just rename the keys? On Mar 3, 2013 8:24 AM, "Robin Kraft" notifications@github.com wrote:

What are the definitive map keywords? Where does :dc:publisher come from?

On Mar 2, 2013, at 7:55 PM, Aaron Steele notifications@github.com wrote:

@robinkraft i folded your sweet sync queries into the ResourceTable protocol.

Now we're hitting that problem where resource maps have different columns:

gulo.ipt> (def resource-maps (vertnet-ipt-resources)) gulo.ipt> (first resource-maps) {:dc:publisher "Jean L. Woods, Ph.D. Delaware Museum of Natural History< jwoods@delmnh.org>", :link " http://ipt.vertnet.org:8080/ipt/resource.do?r=dmnh_birds", :ipt:dwca " http://ipt.vertnet.org:8080/ipt/archive.do?r=dmnh_birds", :author " larussell@vertnet.org", :guid {:content "c21cd435-718a-4069-b503-776bf0e22b96", :isPermaLink false}, :dc:creator "Jean L. Woods, Ph.D. Delaware Museum of Natural Historyjwoods@delmnh.org", :title "DMNH Birds", :pubDate "Wed, 06 J un 2012 13:41:11 -0500", :ipt:eml " http://ipt.vertnet.org:8080/ipt/eml.do?r=dmnh_birds", :description "The bird collection consists of approximately 67,000 study skins, 9,000 skeletons, 6,000 alcohol-preserved birds, and 36,000 clutches of eggs. The holdings represent about 4,000 bird species. About 140 taxa are in the type collection. The alcohol collection was eighth in the world in 1982 (Wood et al. 1982) and has nearly doubled in size since then, the skeleton collection was 18th in the world in 1986 (Wood & Schnell 1986), and the egg collection is second largest in North America (Kiff & Hough 1985). All skeletal and alcohol specimen data are entered into a computer database, as are all of the study skins. The egg collection data is not in a database. The collection, worldwide in scope, has especially strong collections of Philippine and Central and South American birds. Extinct species are also represented. Formation of the collection began when the Museum was founded in 1957. Among the collections that can be found here are those of George Miksch Sutton, Allan R. Phillips, Olin S. Pettingill, T.D. Burleigh, D.S. Rabor, and M. Hachisuka - D.S. Ripley. <a href=\"http://ipt.vertnet.org:8080/ipt/logo.do?r=dmnh_birds\">Resource Logo <a href=\"http://ipt.vertnet.org:8080/ipt/eml.do?r=dmnh_birds\">EML"}

gulo.ipt> (insert-ipt-resources "foo" resource-maps) AssertionError Assert failed: (apply kws-match? maps) cartodb.utils/maps->insert-sql (utils.clj:47) — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/gulo/pull/43#issuecomment-14349346 .

robinkraft commented 11 years ago

Ah, I thought I'd worked with the RSS feed before and seen the keys used in the table. Anyway, easiest thing would be to change the table column names, but the :'s probably preclude that. So if those are the official keys coming out of the RSS feed, we need a function to change the keys. On it this week.

On Mar 3, 2013, at 8:32 AM, Aaron Steele notifications@github.com wrote:

The ipt RSS feed. When it's converted to a map, those are then keys that come out. Maybe just rename the keys? On Mar 3, 2013 8:24 AM, "Robin Kraft" notifications@github.com wrote:

What are the definitive map keywords? Where does :dc:publisher come from?

On Mar 2, 2013, at 7:55 PM, Aaron Steele notifications@github.com wrote:

@robinkraft i folded your sweet sync queries into the ResourceTable protocol.

Now we're hitting that problem where resource maps have different columns:

gulo.ipt> (def resource-maps (vertnet-ipt-resources)) gulo.ipt> (first resource-maps) {:dc:publisher "Jean L. Woods, Ph.D. Delaware Museum of Natural History< jwoods@delmnh.org>", :link " http://ipt.vertnet.org:8080/ipt/resource.do?r=dmnh_birds", :ipt:dwca " http://ipt.vertnet.org:8080/ipt/archive.do?r=dmnh_birds", :author " larussell@vertnet.org", :guid {:content "c21cd435-718a-4069-b503-776bf0e22b96", :isPermaLink false}, :dc:creator "Jean L. Woods, Ph.D. Delaware Museum of Natural Historyjwoods@delmnh.org", :title "DMNH Birds", :pubDate "Wed, 06 J un 2012 13:41:11 -0500", :ipt:eml " http://ipt.vertnet.org:8080/ipt/eml.do?r=dmnh_birds", :description "The bird collection consists of approximately 67,000 study skins, 9,000 skeletons, 6,000 alcohol-preserved birds, and 36,000 clutches of eggs. The holdings represent about 4,000 bird species. About 140 taxa are in the type collection. The alcohol collection was eighth in the world in 1982 (Wood et al. 1982) and has nearly doubled in size since then, the skeleton collection was 18th in the world in 1986 (Wood & Schnell 1986), and the egg collection is second largest in North America (Kiff & Hough 1985). All skeletal and alcohol specimen data are entered into a computer database, as are all of the study skins. The egg collection data is not in a database. The collection, worldwide in scope, has especially strong collections of Philippine and Central and South American birds. Extinct species are also represented. Formation of the collection began when the Museum was founded in 1957. Among the collections that can be found here are those of George Miksch Sutton, Allan R. Phillips, Olin S. Pettingill, T.D. Burleigh, D.S. Rabor, and M. Hachisuka - D.S. Ripley. <a href=\"http://ipt.vertnet.org:8080/ipt/logo.do?r=dmnh_birds\">Resource Logo <a href=\"http://ipt.vertnet.org:8080/ipt/eml.do?r=dmnh_birds\">EML"}

gulo.ipt> (insert-ipt-resources "foo" resource-maps) AssertionError Assert failed: (apply kws-match? maps) cartodb.utils/maps->insert-sql (utils.clj:47) — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/gulo/pull/43#issuecomment-14349346 .

— Reply to this email directly or view it on GitHub.

eightysteele commented 11 years ago

Sweet sweet. On Mar 3, 2013 8:41 AM, "Robin Kraft" notifications@github.com wrote:

Ah, I thought I'd worked with the RSS feed before and seen the keys used in the table. Anyway, easiest thing would be to change the table column names, but the :'s probably preclude that. So if those are the official keys coming out of the RSS feed, we need a function to change the keys. On it this week.

On Mar 3, 2013, at 8:32 AM, Aaron Steele notifications@github.com wrote:

The ipt RSS feed. When it's converted to a map, those are then keys that come out. Maybe just rename the keys? On Mar 3, 2013 8:24 AM, "Robin Kraft" notifications@github.com wrote:

What are the definitive map keywords? Where does :dc:publisher come from?

On Mar 2, 2013, at 7:55 PM, Aaron Steele notifications@github.com wrote:

@robinkraft i folded your sweet sync queries into the ResourceTable protocol.

Now we're hitting that problem where resource maps have different columns:

gulo.ipt> (def resource-maps (vertnet-ipt-resources)) gulo.ipt> (first resource-maps) {:dc:publisher "Jean L. Woods, Ph.D. Delaware Museum of Natural History< jwoods@delmnh.org>", :link " http://ipt.vertnet.org:8080/ipt/resource.do?r=dmnh_birds", :ipt:dwca " http://ipt.vertnet.org:8080/ipt/archive.do?r=dmnh_birds", :author " larussell@vertnet.org", :guid {:content "c21cd435-718a-4069-b503-776bf0e22b96", :isPermaLink false}, :dc:creator "Jean L. Woods, Ph.D. Delaware Museum of Natural History< jwoods@delmnh.org>", :title "DMNH Birds", :pubDate "Wed, 06 J un 2012 13:41:11 -0500", :ipt:eml " http://ipt.vertnet.org:8080/ipt/eml.do?r=dmnh_birds", :description "The bird collection consists of approximately 67,000 study skins, 9,000 skeletons, 6,000 alcohol-preserved birds, and 36,000 clutches of eggs. The holdings represent about 4,000 bird species. About 140 taxa are in the type collection. The alcohol collection was eighth in the world in 1982 (Wood et al. 1982) and has nearly doubled in size since then, the skeleton collection was 18th in the world in 1986 (Wood & Schnell 1986), and the egg collection is second largest in North America (Kiff & Hough 1985). All skeletal and alcohol specimen data are entered into a computer database, as are all of the study skins. The egg collection data is not in a database. The collection, worldwide in scope, has especially strong collections of Philippine and Central and South American birds. Extinct species are also represented. Formation of the collection began when the Museum was founded in 1957. Among the collections that can be found here are those of George Miksch Sutton, Allan R. Phillips, Olin S. Pettingill, T.D. Burleigh, D.S. Rabor, and M. Hachisuka - D.S. Ripley. <a href=\"http://ipt.vertnet.org:8080/ipt/logo.do?r=dmnh_birds\">Resource

Logo <a href=\" http://ipt.vertnet.org:8080/ipt/eml.do?r=dmnh_birds\">EML"}

gulo.ipt> (insert-ipt-resources "foo" resource-maps) AssertionError Assert failed: (apply kws-match? maps) cartodb.utils/maps->insert-sql (utils.clj:47) — Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub< https://github.com/VertNet/gulo/pull/43#issuecomment-14349346> .

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/VertNet/gulo/pull/43#issuecomment-14349830 .