EOL / tramea

A lightweight server for denormalized EOL data
Other
2 stars 1 forks source link

eol_names_and_ranks_archive.tar.gz only downloadable via browser (not command-line) #10

Open JRice opened 8 years ago

JRice commented 8 years ago

currently it ends up in /opt/downloads/eol_names_and_ranks_archive.tar.gz and we need it to be accessible to PHP somewhere "standard".

Update the script: /opt/eol_php_code/rake_tasks/create_eol_archive_names_only.php

JRice commented 8 years ago

The URL should be static, and publicly available... apparently there is one that is working, location TBD.

Note that you should add this file to the .gitignore list, though!

JRice commented 8 years ago

Is this already working? i.e.: http://services.eol.org/downloads/eol_names_and_ranks_archive.tar.gz

jhpoelen commented 8 years ago

@jhammock @JRice thanks for making this available.

For some reason, using wget services.eol.org/downloads/eol_archive_objects.tar.gz results in:

--2015-09-15 08:52:46--  http://services.eol.org/downloads/eol_archive_objects.tar.gz
Resolving services.eol.org... 160.111.248.28
Connecting to services.eol.org|160.111.248.28|:80... connected.
HTTP request sent, awaiting response... 503 Service Unavailable
2015-09-15 08:52:47 ERROR 503: Service Unavailable.

Same occurs for wget http://services.eol.org/downloads/eol_names_and_ranks_archive.tar.gz

--2015-09-15 08:54:43--  http://services.eol.org/downloads/eol_names_and_ranks_archive.tar.gz
Resolving services.eol.org... 160.111.248.28
Connecting to services.eol.org|160.111.248.28|:80... connected.
HTTP request sent, awaiting response... 503 Service Unavailable
2015-09-15 08:54:45 ERROR 503: Service Unavailable.

Downloading in Firefox v40.0.3 seems to work ok (download still in progress...).

Can you reproduce this?

jhpoelen commented 8 years ago

The resource at http://services.eol.org/downloads/eol_names_and_ranks_archive.tar.gz contains the following columns:

taxonID scientificName nameAccordingTo taxonRank genus specificEpithet
34543 Enhydra EOL Group on Flickr; IUCN Red List ... genus Enhydra

This list helps to associate a specific name string to an EOL page id. However, it is not clear how the name fits into a taxonomic tree (if at all).

I had a similar issue with GloBI were Sergey (see https://github.com/jhpoelen/eol-globi-data/issues/70) asked for a full download of taxa included in GloBI. To help do this, I created a dump with with full taxonomic path including tsn's or taxon ids of external taxonomies. This way, Sergey (and others) can easily link a specific taxon to a multitude of taxonomies without having to retrieve them one by one.

I've included an example produced by GloBI after taxonomic name matching against various taxonomic services (including EOLs). Any way that EOL can provide a similar taxon dump periodically? This would cut down the GloBI name resolution time from about a week to hours or less and use the careful taxon links established by EOL curators.

id name rank commonNames path pathIds pathNames
EOL:328583 Enhydra lutris Species Seeotter @de ; sea otter @en ; Nutria marina @es ; Merisaukko @fi ; Loutre de mer @fr ; Zeeotter @nl ; Loira de mar @oc ; Animalia ; Chordata ; Mammalia ; Carnivora ; Mustelidae ; Enhydra ; Enhydra lutris EOL:1 ; EOL:694 ; EOL:1642 ; EOL:7662 ; EOL:7670 ; EOL:34543 ; EOL:328583 kingdom ; phylum ; class ; order ; family ; genus ; species
OTT:949676 Enhydra lutris species Animalia ; Chordata ; Mammalia ; Carnivora ; Mustelidae ; Enhydra ; Enhydra lutris IRMNG:11 ; IRMNG:148 ; IRMNG:1310 ; IRMNG:12116 ; IRMNG:104767 ; IRMNG:1297077 ; IRMNG:10198728 kingdom ; phylum ; class ; order ; family ; genus ; species
GBIF:2433670 Enhydra lutris species Animalia ; Chordata ; Mammalia ; Carnivora ; Mustelidae ; Enhydra ; Enhydra lutris GBIF:1 ; GBIF:44 ; GBIF:359 ; GBIF:732 ; GBIF:5307 ; GBIF:2433669 ;

...

jhpoelen commented 8 years ago

See https://github.com/jhpoelen/eol-globi-data/issues/145 and https://github.com/jhpoelen/eol-globi-data/issues/70 for GloBI specific examples.