0x48piraj / DomRadar

Lightning fast Python tool for discovering available domain names.
MIT License
1 stars 1 forks source link

Expand the species dataset #1

Open 0x48piraj opened 3 years ago

0x48piraj commented 3 years ago

Uniprot

Uniprot has a list of the controlled vocabulary for common and scientific names of species listed over speclist.txt.

An example entry:

ACAER E  111511: N=Acanthodactylus erythrurus
                 C=Spanish fringe-toed lizard
                 S=Lacerta erythrura

In the example the N is the scientific binomial name (Canthodactylus erythrurus), C is the common name (Spanish fringe-toed lizard).

ACAER is the id code, 111511 is the code for the taxonomic node, E means it is a eukaryote, and S is a synonym of either name.

The list contains 25336 scientific names currently, which falls short of the ~2.5m species in GBIF, or the 10s, or 100s of millions that are estimated to exist. The Uniprot list does, however, represent every organism included in Uniprot, which is widely regarded as being among the most comprehensive protein databases that exist today.

GBIF

The Global Biodiversity Information Facility (GBIF) has an [API]()http://www.gbif.org/developer/species where you extract data for species names. Their database includes common names (aka vernacular names) when they have that, and often common names from different languages. Using this API, you can extract data and construct a name file for a particular taxa that you are interested in.

As an example, this is the list of the first 20 vernacular names found for Passer domesticus (House sparrow):

{
   "endOfRecords" : false,
   "results" : [
      {
         "language" : "",
         "sourceTaxonKey" : 100220560,
         "source" : "Global Invasive Species Database",
         "vernacularName" : "English sparrow"
      },
      {
         "language" : "",
         "sourceTaxonKey" : 100220560,
         "vernacularName" : "Europese huismuis",
         "source" : "Global Invasive Species Database"
      },
      {
         "vernacularName" : "Gorrion domestico",
         "source" : "Global Invasive Species Database",
         "language" : "",
         "sourceTaxonKey" : 100220560
      },
      {
         "source" : "Integrated Taxonomic Information System (ITIS)",
         "vernacularName" : "Gorrión casero",
         "language" : "spa",
         "sourceTaxonKey" : 102101640
      },
      {
         "vernacularName" : "Gorrión Común",
         "sourceTaxonKey" : 123213203,
         "language" : "spa"
      },
      {
         "language" : "spa",
         "sourceTaxonKey" : 101186844,
         "source" : "The European Nature Information System (EUNIS)",
         "vernacularName" : "Gorrión Común"
      },
      {
         "language" : "spa",
         "sourceTaxonKey" : 114130266,
         "source" : "Colaboraciones Americanas Sobre Aves",
         "vernacularName" : "Gorrión casero"
      },
      {
         "vernacularName" : "Gorrión casero",
         "source" : "Yanayacu Natural History Research Group",
         "sourceTaxonKey" : 119245200,
         "language" : "spa"
      },
      {
         "vernacularName" : "Gorrión casero",
         "source" : "Catalogue of Life",
         "sourceTaxonKey" : 119950016,
         "language" : "spa"
      },
      {
         "language" : "swe",
         "sourceTaxonKey" : 101186844,
         "vernacularName" : "Gråsparv",
         "source" : "The European Nature Information System (EUNIS)"
      },
      {
         "vernacularName" : "Gråspurv",
         "language" : "dan",
         "sourceTaxonKey" : 123213203
      },
      {
         "vernacularName" : "Gråspurv",
         "language" : "nob",
         "sourceTaxonKey" : 123213203
      },
      {
         "language" : "deu",
         "sourceTaxonKey" : 116795880,
         "vernacularName" : "Haussperling",
         "source" : "Taxon list of animals with German names (worldwide) compiled at the SMNS",
         "country" : "DE"
      },
      {
         "language" : "deu",
         "sourceTaxonKey" : 100483595,
         "source" : "Belgian Species List",
         "country" : "BE",
         "vernacularName" : "Haussperling"
      },
      {
         "language" : "deu",
         "sourceTaxonKey" : 123213203,
         "vernacularName" : "Haussperling"
      },
      {
         "sourceTaxonKey" : 101186844,
         "language" : "deu",
         "source" : "The European Nature Information System (EUNIS)",
         "vernacularName" : "Haussperling"
      },
      {
         "source" : "The Clements Checklist",
         "vernacularName" : "House Sparrow",
         "language" : "eng",
         "sourceTaxonKey" : 113987294
      },
      {
         "vernacularName" : "House Sparrow",
         "source" : "Taxonomy in Flux Checklist",
         "language" : "eng",
         "sourceTaxonKey" : 100159046
      },
      {
         "source" : "Colaboraciones Americanas Sobre Aves",
         "vernacularName" : "House Sparrow",
         "language" : "eng",
         "sourceTaxonKey" : 114130266
      },
      {
         "sourceTaxonKey" : 102101640,
         "language" : "eng",
         "vernacularName" : "House Sparrow",
         "source" : "Integrated Taxonomic Information System (ITIS)"
      }
   ],
   "limit" : 20,
   "offset" : 0
}

Using this type of search: api.gbif.org/v1/species?name=Passer%20domesticus, you can look for all info for a particular species, starting from either a scientific name or a common name (in example, Passer domesticus).

GBIF includes information in 1,643,948 species (and counting), but I don't know for what proportion they have common names (or where there are common names).

Marine

For marine species, the World Register of Marine Species is probably the best place to find this information.

The Ocean Biogeographic Information System also contains a tremendous amount of marine species.

Generic

The website of Observado, an initiative to collect species observations worldwide, has global species lists in csv format that are as complete as possible. The plant list currently has 381.473 records. You can download local species names in more languages you might have heard from, from English to Russian and from Frysk to Dzongkha.

Note that these lists are meant for observations in the field, and hence also contain multispecies, hybrids and synonyms. But these can be filtered out easily.

0x48piraj commented 3 years ago