Planteome / samara

extracts plant trait data from open data sources like apsnet and ars-grin
MIT License
5 stars 3 forks source link

extract plant traits from ars-grin #5

Closed jhpoelen closed 8 years ago

jhpoelen commented 8 years ago

see http://www.ars-grin.gov/npgs/index.html

Agricultural Research Service - Germplasm Resource Information Network or ars-grin contains traits (e.g. disease, insect, fruit size) from many crops (e.g. apple, wheat).

jhpoelen commented 8 years ago

A first pass at scraping grin create the following results. The results were created by starting at a page containing all crops and sampling a single wheat crop accession:

Observation(Triticum monococcum L. subsp. monococcum,List(Taxon(genus,Triticum,12442), Taxon(family,Poaceae,897), Taxon(subfamily,Pooideae,1472), Taxon(tribe,Triticeae,1317)),Descriptor(65098),Method(402008,Descriptor(65098)),0 - RESISTANT, NO SYMPTOMS,1265054)
Observation(Triticum monococcum L. subsp. monococcum,List(Taxon(genus,Triticum,12442), Taxon(family,Poaceae,897), Taxon(subfamily,Pooideae,1472), Taxon(tribe,Triticeae,1317)),Descriptor(65099),Method(402007,Descriptor(65099)),0 - RESISTANT, NO SYMPTOMS,1265054)
Observation(Triticum monococcum L. subsp. monococcum,List(Taxon(genus,Triticum,12442), Taxon(family,Poaceae,897), Taxon(subfamily,Pooideae,1472), Taxon(tribe,Triticeae,1317)),Descriptor(65111),Method(402008,Descriptor(65111)),0,1265054)
Observation(Triticum monococcum L. subsp. monococcum,List(Taxon(genus,Triticum,12442), Taxon(family,Poaceae,897), Taxon(subfamily,Pooideae,1472), Taxon(tribe,Triticeae,1317)),Descriptor(65112),Method(402007,Descriptor(65112)),0,1265054)
Observation(Triticum monococcum L. subsp. monococcum,List(Taxon(genus,Triticum,12442), Taxon(family,Poaceae,897), Taxon(subfamily,Pooideae,1472), Taxon(tribe,Triticeae,1317)),Descriptor(65002),Method(491608,Descriptor(65002)),W - WINTER,1265054)
Observation(Triticum monococcum L. subsp. monococcum,List(Taxon(genus,Triticum,12442), Taxon(family,Poaceae,897), Taxon(subfamily,Pooideae,1472), Taxon(tribe,Triticeae,1317)),Descriptor(65082),Method(490492,Descriptor(65082)),8 - (1 = RESISTANT, 9 = SUSCEPTIBLE 71-80% S),1265054)
Observation(Triticum monococcum L. subsp. monococcum,List(Taxon(genus,Triticum,12442), Taxon(family,Poaceae,897), Taxon(subfamily,Pooideae,1472), Taxon(tribe,Triticeae,1317)),Descriptor(65083),Method(145,Descriptor(65083)),9 - PLANT DEATH OR NO RECOVERY POSSIBLE,1265054)
Observation(Triticum monococcum L. subsp. monococcum,List(Taxon(genus,Triticum,12442), Taxon(family,Poaceae,897), Taxon(subfamily,Pooideae,1472), Taxon(tribe,Triticeae,1317)),Descriptor(65085),Method(145,Descriptor(65085)),R - LEAVES ROLLED, LOOSELY TO TIGHTLY,1265054)
Observation(Triticum monococcum L. subsp. monococcum,List(Taxon(genus,Triticum,12442), Taxon(family,Poaceae,897), Taxon(subfamily,Pooideae,1472), Taxon(tribe,Triticeae,1317)),Descriptor(65016),Method(494848,Descriptor(65016)),7 - TAN,1265054)
Observation(Triticum monococcum L. subsp. monococcum,List(Taxon(genus,Triticum,12442), Taxon(family,Poaceae,897), Taxon(subfamily,Pooideae,1472), Taxon(tribe,Triticeae,1317)),Descriptor(65006),Method(494848,Descriptor(65006)),20.2,1265054)
jhpoelen commented 8 years ago

see https://github.com/jhpoelen/samara/releases/download/v0.1.0/grin-first-10k.tsv for first 10k accession observations extracted at about a rate of 80 lines / s using the scraper.