Closed gbif-portal closed 9 months ago
I think this is a perfect example for the type of detail needed for #531.
It falls between informatics and comms. I'll make a quick start here, who should continue?
gbifid — a number for the occurrence record in the GBIF portal, this is usually kept constant even when the record details change
datasetkey — identifies the dataset, as registered in GBIF.org. https://www.gbif.org/dataset/
Some explanation is in the API documentation: https://www.gbif.org/developer/occurrence#parameters , but I don't know of any other resource. @timrobertson100, did we have a page for this type of information?
Missing: Infraspecificepithet, coordinateprecision (how calculated?), coordinateuncertaintyinmeters, elevationaccuracy, depthaccuracy
Nr. | Colname | Description 1 | gbifid | a number for the occurrence record in the GBIF portal, this is usually kept constant even when the record details change 2 | datasetkey | identifies the dataset, as registered in GBIF.org. https://www.gbif.org/dataset/ shows the dataset's page 3 | occurrenceid | identifier provided by the data provider for this occurrence 4 | Kingdom | 5 | Phylum | 6 | Class | 7 | Order | 8 | Family | 9 | Genus | 10 | Species | General species name 11 | Infraspecificepithet | 12 | Taxonrank | Species or subspecies 13 | Scientificname | Detailed name species and subspecies 14 | Countrycode | ISO 3166-1 two-letter country code 15 | Locality | Specific area: location name 16 | publishingorgkey | GBIF publishing organization key, https://www.gbif.org/publisher/ 17 | decimallatitude | 18 | decimallongitude | 19 | coordinateuncertaintyinmeters | 20 | coordinateprecision | 0-10 (?) 21 | elevation | 22 | elevationaccuracy | Found between 0 and 550 (?) 23 | Depth | 24 | Depthaccuracy | 25 | Eventdage | yyyy-mm-dd 26 | day | dd 27 | Month | mm 28 | year | yyyy 29 | taxonkey | taxon/name key in the GBIF backbone 30 | specieskey | 31 | basisofrecord | "PreservedSpecimen", "FossilSpecimen", "LivingSpecimen", "HumanObservation", "MachineObservation", “Literature”, “MaterialSample”, “ Observation”, “Unknown” https://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/BasisOfRecord.html 32 | institutioncode | https://terms.tdwg.org/wiki/dwc:institutionCode 33 | collectioncode | https://terms.tdwg.org/wiki/dwc:collectionCode 34 | catalognumber | https://terms.tdwg.org/wiki/dwc:catalogNumber 35 | Recordnumber | https://terms.tdwg.org/wiki/dwc:recordNumber 36 | identifiedby | Names of persons 37 | License | Numbers 38 | rightsholder | For example musem or universtity 39 | recordedby | Also names of persons 40 | Typestatus | Nomenclatural type (type status, typified scientific name, publication) applied to the subject. 41 | establishmentmeans | INTRODUCED, INVASIVE, MANAGED, NATIVE, NATURALISED, UNCERTAIN https://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/EstablishmentMeans.html 42 | lastinterpreted | the date the record was last interpreted by GBIF. Sometimes records are interpreted because they have been updated by the provider, other times because GBIF improves our processing and reinterprets all records 43 | mediatype | “Still image”, “MovingImage”, “Sound” https://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/MediaType.html 44 | issue | Possible problems detected by GBIF's interpretation processes. · RECORDED_DATE_INVALID · COORDINATE_ROUNDED · RECORDED_DATE_MISMATCH· GEODETIC_DATUM_ASSUMED· COUNTRY_DERIVED_FROM_COORDINATES· BASIS_OF_RECORD_INVALID· GEODETIC_DATUM_INVALID· COORDINATE_PRECISION_INVALID· INDIVIDUAL_COUNT_INVALID· ELEVATION_MIN_MAX_SWAPPED· COUNTRY_INVALID· IDENTIFIED_DATE_UNLIKELY· COORDINATE_REPROJECTED· COORDINATE_UNCERTAINTY_METERS_INVALID
If someone in Informatics provides the documentation, i.e. a description of every field contained in the occurrences.txt file, I'll be happy to edit, review, format and publish somewhere useful.
This would represent comprehensive detail laying behind the brief explanation @andersfi suggests including in email...
Suggestion: rather than reproducing all, aren't we better off highlighting/curating key terms while pointing to TDWG index for comprehensive list?
@kcopas: Yes, this is more or less exactly what I think there should be easy findable links to. Explaining the key terms and then linking up to the TDWG index for a comprehensive overview should be both easily done and very useful - instead of filling up the download mail with a lot of links, maybe it could be better to link to a page giving an overview - possible in the form of a type of cheetsheet (i.e. more or less as the FAQ but with a graphical sorting of potential issues)
One note: Be aware that GBIF does not always follow or support the definitions in the TDWG index. e.g. eventDate where for example GBID does not handle intervals at the moment. GBIF needs to communicate clearly at points where there are deviations from the DwC definitions and GBIF interpretation as well as non DwC terms used. I guess this require joint efforts from communication and informatics.
@MattBlissett closing this? https://techdocs.gbif.org/en/data-use/download-formats
Hello. I have downloaded data from several plant species. Where can I find what all the columns are exactly? Such as: how do I know which country belongs to which country code?
User provided contact info: rosanne.beukeboom@gmail.com System: Chrome 64.0.3282 / Windows 10.0.0 User: See in registry Referer: https://www.gbif.org/occurrence/datasets?has_coordinate=true&has_geospatial_issue=false&taxon_key=3152520 Window size: width 1366 - height 662 API log&_a=(columns:!(_source),index:'prod-varnish-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499')),sort:!('@timestamp',desc))) Site log&_a=(columns:!(_source),index:'prod-portal-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499')),sort:!('@timestamp',desc))) System health at time of feedback: OPERATIONAL