Open MortenHofft opened 5 months ago
Can we learn something from web analytics?
Frequency of /search?
parameters?
(point being that far from all searches lead to downloads)
@MortenHofft I haven't looked at which filters are used most frequently in a while.
I did this way back in 2018. We could re-cook something similar if it is important. https://gbif.blogspot.com/
Thanks @jhnwllr - it is probably the same as back then. Based on below it is the same
Based on simply looking at the last few hundred downloads it looks to be roughly: taxonKey, hasCoordinates, hasGeoSpatialIssue country, continent, gadm (very rarely a geom filter) year, month basisOfRecord
we could then add license, occurrenceStatus and issues simply because we believe they are important dataset and publishers are very rarely used. I didn't see one case in the first few hundred downloads. Secretariat use might be very different though
The current list of simple filters are long and most aren't popular filters https://www.gbif.org/occurrence/search?occurrence_status=present&q=
But they are there because someone at some point decided that this was very important. Sometimes prompted by "real" user feedback (like type) other types coming from the secretariat (e.g. license, iucn, occurrenceStatus)
Personally my gut feeling (somewhat backed by data) is to reduce simple to: occurrenceStatus (educational) license (educational) taxonKey year month Location country continent gadm dataset, publisher (mainly for publishers and secretariat I assume, but also teach users about where data comes from) basisOfRecord (educational) issues (educational)
See https://techdocs.gbif.org/en/informatics/web-logs to query the logs.
Very quickly:
buckets 536371
stateProvince 547530
eventKey 632725
publishing_country 641444
TYPE_STATUS 679099
TAXON_KEY 679154
publishingCountry 681141
year.facetLimit 698770
verbose 701853
gadmGid 721490
coordinate_uncertainty_in_meters 734653
coordinateUncertaintyInMeters 826350
license 854881
event_date 903064
/occurrence/search 918362
establishmentMeans 936325
publishingOrg 942410
hosting_organization_key 957755
SpeciesKey 967576
name_type.facetLimit 1044787
geom 1205445
species_key 1279380
recordedBy 1661021
country.facetLimit 1711170
orderKey 1904599
gadm_gid 1965072
publishing_org 2299395
month 2700218
occurrenceID 2953486
collectionCode 3420397
mediatype 4561405
continent 4702938
secondDimension 4724621
isGeoreferenced 4811250
lastInterpreted 4942254
type 5759946
depth 6497426
basis_of_record 6537020
type_status.facetLimit 7625427
coordinatestatus 8235651
month.facetLimit 8524763
occurrenceId 8912346
scientificname 9490116
geoDistance 9652000
issue.facetLimit 9849302
institutionCode 10668968
basisofrecord 10804202
facetMultiselect 11608639
cachebust 11917492
kingdomKey 12566814
basisOfRecord 15630483
mediaType 15840944
typeStatus 17233716
issue 18447789
dwca_extension.facetLimit 20673033
facetOffset 21904364
catalogNumber 22396800
q 23072408
locale 27204390
decimalLatitude 27623668
decimalLongitude 28074966
eventId 29015063
advanced 30323195
event_id 32839495
speciesKey 35983120
modified 40400665
occurrence_status 45942714
year 58286363
geometry 59074028
eventDate 59617662
scientificName 64556531
hasCoordinate 72938097
hasGeospatialIssue 81299124
facetLimit 90144576
facet 93546298
occurrenceStatus 100666456
country 139312832
datasetKey 152112174
has_geospatial_issue 171145584
dataset_key 176426102
has_coordinate 184099638
offset 235623329
media_type 433701395
taxonKey 444973495
taxon_key 448807411
limit 1167973788
I don't know how useful this is. I'm querying the API, maybe querying the portal would be better — but does that have a hit in Varnish for every search, or is there Javascript magic?
Thanks Matt. That includes all the requests the portal do to generate the pages, charts etc. I could imagine that will skew the results. I suppose that ideally we only look at parameters for https://www.gbif.org/occurrence/[search/map/gallery]
I think some parameters are truncated as there's a limit to the length of the query string that's logged.
depth 108
face 109
eventdate 112
taxo 113
protocol 116
display 116
origin 122
tax 122
networkkey 123
rank 126
ampoccurrencestatus 127
elevation 129
lifestage 135
collectionkey 139
taxonk 141
all 145
ampadvanced 152
occurrenc 157
hostingorganizationkey 162
ampq 164
organismid 165
occurrencestatu 166
occurrence 172
coun 180
occurrencesta 181
basisof 183
programmeid 184
type 185
hascoordinat 199
locality 201
hascoordina 217
seconddimension 221
spatialissues 226
cachebust 237
gbifid 241
yea 251
basi 273
typestatus.facetlimit 276
mdrv 284
recordedbyid 319
hasgeospatialissu 335
month.facetlimit 338
taxon 343
hascoordi 366
utmcampaign 392
utmmedium 397
amp 402
scientificname 433
license 463
fbclid 503
issue.facetlimit 524
ref 556
occurrenceid 557
dwcaextension.facetlimit 562
projectid 576
586
status 589
highertaxonkey 597
dimension 601
stateprovince 606
amptaxonkey 621
boundingbox 629
h 647
lang 820
coordinateuncertaintyinmeters 833
institutionkey 880
repatriated 911
utmsource 958
iucnredlistcategory 997
nonse 1012
verbatimscientificname 1018
facetmultiselect 1033
taxonke 1058
isincluster 1114
recordnumber 1124
hasgeospatialis 1227
facet 1238
hasgeospatialiss 1244
hasgeospatiali 1391
dwcaextension 1450
gbifdatasetkey 1550
typestatus 1573
hasge 1698
month 1784
eventid 2278
gbiftaxonkey 2608
institutioncode 2967
t 3099
path 3248
hasgeosp 3303
hasgeospat 3373
recordedby 3530
catalognumber 4468
collectioncode 4470
contenttype 4763
publishingcountry 5290
mediatype 5972
ha 6368
continent 6504
has 7215
hasgeospati 7271
hasgeospatia 7389
issue 7841
occ 8779
v 9760
hasgeospatial 11159
hasgeos 11427
hasgeo 12148
limit 13207
gadmgid 14095
hasg 23598
basisofrecord 25141
hasgeospa 26193
year 30686
offset 52660
advanced 62530
country 75962
datasetkey 120749
publishingorg 129824
hash 132148
locale 153199
q 257436
hasgeospatialissue 299560
geometry 360734
occurrencestatus 442444
hascoordinate 446916
taxonkey 584007
Thanks! That looks pretty much as expected in above guessing I would say. The biggest surprise is that repatriated
is being used at all
Removing the simple/advanced toggle proved very unpopular within the secretariat. So I will add that back. It also prompted a discussion about what filters to include in the simple type.
Which are the simple filters we should show on occurrence search.
@jhnwllr you know what filters are used most frequently on downloads. It is reasonable to assume that reflects which are most frequently used on the UI as well.
Do others have ideas which is the simple filters?