BiologicalRecordsCentre / iRecord

Repository to store and track enhancements, issues and tasks regarding the iRecord website.
http://irecord.org.uk
2 stars 1 forks source link

Species pages and mapping functionality #1286

Closed Sam-Amy closed 1 year ago

Sam-Amy commented 2 years ago

Following our discussion (20.04.22) about iRecord data visualisation tools produced by Rich Burkmar, re: linking the new species mapping functionality with the existing species page containing info/photo.

Issues with current species page layout (e.g. https://irecord.org.uk/species-details?taxon_meaning_id=71446):

In terms off implementing the mapping functionality, would a BSBI-type format work, with a page for species info with static inset Atlas-style image, and a link above this to interactive mapping (e.g. https://www.brc.ac.uk/plantatlas/plant/orchis-mascula)?

Raised here as an issue for discussion in next iRecord prioritisation meeting.

burkmarr commented 2 years ago

@Sam-Amy - it's a different style of map. The old one displays the actual record locations from a geoserver WMS - the new one (which is the same as the grid map on the explore records page) displays records aggregated to atlas squares (e.g. hectads, tetrads, monads) at lower zoom levels but at high zoom switches to the record points. In any case it's a moot point because the concensus seems to be not to include the interactive map because there is a link to take you to the full explore records page anyway.

Sam-Amy commented 2 years ago

Thanks @burkmarr

When I was looking earlier I could see just a few points dotted around the UK, which didn’t make sense according to this. I can now see far more, so I’m not sure what was going on there?!

burkmarr commented 2 years ago

To address the points above, I have:

  1. Removed duplicate custom attribute names.
  2. Added a phenology chart.
  3. Made use of the proxyCacheTimeout option on ES queries.
  4. Replaced the dual explore map with the hectad map only.

https://burkmar-brc-irecord.pantheonsite.io/species-details2?taxon_meaning_id=71446

The phenology chart ([recsthroughyear]) has a '@period' option that can be set either to 'week' or 'month' - this doesn't change the x axis (which always shows months) but it affects the way the line is generated (52 points of weekly totals vs 12 months of monthly totals) and changes the y axis and label accordingly.

@johnvanbreda - I set the proxyCacheTimeout option to 300 for the ES queries. Is that reasonable?

The dual map is still available on the form (control [dualmap]) but I have replaced it with the [hectadmap] control in the form configuration.

@DavidRoy - I also thought that the authority on 'Orchis mascula (L.) L.' looked strange and thought that it was being mangled somehow. But when I looked at the database, that is how the authority is specified. I think it might signify that Linnaeus originally named the taxon differently and then renamed it himself to Orchis mascula.

johnvanbreda commented 2 years ago

I set the proxyCacheTimeout option to 300 for the ES queries. Is that reasonable?

Yes, that seems fair to me.

kitenetter commented 2 years ago

@burkmarr this looks great.

Can we add a line break (or an actual line) between the map and the subsequent "records by year" subhead, just to clarify which subhead goes with which visualisation.

And I realise that this is taking us into mission-creep territory, but as and when possible I would like to see an option to add date-banding to the map, e.g. to distinguish pre-2000 records from 2000 onwards. For schemes that now have most or all of their data in Indicia maps without any date-banding can be quite misleading.

Also probably for the future, it would be good to be able to control which life-stages are included in the "records by week" chart, e.g. to separate out adults and larvae, or to exclude all non-adult stages etc.

burkmarr commented 2 years ago

@kitenetter I will do something to separate the map from the stuff below it.

I think that adding a facility to change the displayed hectads based on date of the records would be useful in the context of the species summary page, so happy to look at this. There are two obvious ways to do it:

  1. Apply a user-controlled year range filter to the records that are used to construct the hectad atlas map. That would result in a map that only shows hectads that have at least one record in the specified period. Since colour would not be required to indicate a date band, it could still be used to give an indication of the relative number of records in each hectad.

  2. As you've described, apply user-controlled year threshold(s) to change the colour of any particular hectad, indicating which year band it is in. (Presumably the year band assigned to a hectad would be determined by the most recent record in it.)

I'm happy to look into wichever of these you think is the best approach. If the second approach, then we need to decide on the number of year bands. Two year bands would only require a single control for the user to specify the threshold year. If more than two bands then more controls need to be added - complicating the user interface.

I agree that the life-stage feature would be useful, but because of the amount of flexibility in way this field is used across taxonomic groups (and even within groups), it is not straight-forward to apply on a general species details form. The list below shows the top 50 terms used for the iRecord site for data over the last 5 years. (The count for the term 'not recorded' is for those records where this is explicitly entered. Most records have null recorded against this field which isn't included in the table.)

So even picking out something as simple as adult counts is problematic. From just the terms listed below, you would probably want to include 'adult', 'Adult', 'Adults' and 'adult bug' as well as deciding whether or not 'not recorded' and null values should be assumed to be adults. I guess the way to handle this would be in the page configuration - much like the custom attributes discussed above (the problems are similar in many ways). But we have seen that this isn't ideal. Then there is the problem of what options to offer the user on the page itself - options relevant for invertebrates would not be relevant to many other taxonomic groups for example.

occurrence.life_stage.keyword: Descending Count
adult 1,795,383
Adult 844,166
not recorded 617,843
pre-adult 47,842
flowering 37,048
other 16,405
vegetative 9,603
larva 8,393
native 6,083
nymph seen in spittle foam 5,353
gall 4,228
nymph 4,162
gametophyte (not fruiting) 4,044
Nymph 3,339
Vegetative 3,196
Larva 2,970
fruiting 2,933
spittle foam only 2,418
mine (empty) 2,405
Not recorded 2,284
mine (tenanted) 2,024
mine 1,973
sporophyte present (fruiting) 1,781
immature 1,752
Uncertain 1,587
Adults 1,308
Egg 879
Plant bare 683
pupa 640
egg 534
Immature 524
In leaf 523
young 500
dead 468
Flowering 464
introduced - accidental - regenerating 451
In flower 446
Pupa 417
adult bug 404
sporophyte 386
Other 380
Young 375
introduced - intentional 372
Juvenile 370
Native 345
larval web 340
Leaf-mine 333
larval case 288
With ripe fruit 259
spawn/egg 204
kitenetter commented 2 years ago

I agree that the life-stage issue is complex, and taxon-group specific, so let's leave that aside for now.

For date banding my preference would be your option 2, with user-defined date bands. Ideally I'd go for three time periods, with defaults set to:

But if three time periods is going to add too much complexity then having two time periods would still be good, and I'd suggest defaulting to before 2000, and 2000 onwards (based on most recent record in the hectad - although enabling users to switch to oldest record in the tetrad would also be interesting, e.g. for mapping spread of invasive species ...).

If it's possible to allow users to change the defaults that would be great.

As always I'm conscious that this is just my opinion, and other users may want different things.

DavidRoy commented 2 years ago

As ever, your opinion is good @kitenetter. I would support these three time periods options as a start. A more flexible date-range picker can be a future requirement?

burkmarr commented 2 years ago

Okay great. I will proceed with three coloured time periods.

burkmarr commented 2 years ago

I have implemented the coloured time bands. The three colours can be changed via the page configuration if required. (For the defaults I've used three - 'colourblind safe' from https://colorbrewer2.org/.)

The user sets the thresholds for the time bands using a couple of numeric text controls. I conisidered trying to do something more flashy with sliders, but that would take more effort and is perhaps one for the future? In any case I think this works pretty intuitively - the user can immediately see the effects of the settings they use by referring to the map legend which is displayed immediately below the controls. I've put some logic in there to prevent nonesense values being set (e.g. threshold 1 greater than threshold 2).

It was also easy for me to add the select control which allows the user to toggle between 'most recent on top' (the default) and 'oldest on top'. I thought this could be useful, but if you think the complication to the UI is too much I can remove it (or make its inclusion configurable).

Also it occurred to me that a useful feature would be mouse-over info for the map dots (the map control already supports it and it is a useful feature on the BSBI Atlas site). So if you look below the map you can see how you can get the hectad name, number of records, min record year and max record year for any dot when the mouse is over it.

https://burkmar-brc-irecord.pantheonsite.io/species-details2?taxon_meaning_id=71446

burkmarr commented 2 years ago

I forgot to say - the initial values for the thresholds are configurable, so easy to change.

DavidRoy commented 2 years ago

looks good to me. One small thing. Can the label 'BR Habitats' be changed to 'Broad Habitats'

kitenetter commented 2 years ago

@burkmarr that looks brilliant and functions brilliantly! For the default year thresholds I'd suggest 1999 and 2009:

image

Can we re-word the key:

I think you are including unverified records, and excluding rejected records - it would be good to say so somewhere on the page.

One more mission-creep feature request: is it possible to enable the three date bands to be switched on or off individually, so that I could generate three separate maps for three separate time periods, e.g. to use in a presentation?

Just for fun, here's an example of an expanding species (Downland Villa bee-fly), using the "oldest" display:

image

burkmarr commented 2 years ago

@kitenetter - great, I'm glad that it fits the bill. I've added some text re the rejected and unverified records under the map (let me know if you'd prefer a differet set of words) and changed the legend titles as requested. I've had a think about enabling/disabling bands and I saw the similarities with the highlighting feature on other charts (e.g. donut chart) and I thought it could be an useful general feature to add to the map library. It should be quite easy to do I think so I will have a look at it today and/or tomorrow.

burkmarr commented 2 years ago

@DavidRoy

Can the label 'BR Habitats' be changed to 'Broad Habitats'

The label is the caption for that taxon attribute as defined in the Warehouse - https://warehouse1.indicia.org.uk/index.php/taxa_taxon_list_attribute/edit/76. It could either be changed in the Warehouse, but that would affect it wherever it is used. Alternatively I could add configuration to the species_details_2 prebuild form which allows custom attribute names to be 'translated' for the purposes of this form only.

DavidRoy commented 2 years ago

@burkmarr I think over-riding the warehouse attribute name is the best option. I could imagine we might want to do this for other information we display in these sorts of pages

burkmarr commented 2 years ago

@DavidRoy - just to clarrify, do you mean leave as is in the Warehouse but add some configuration capability in the page to overrride it?

DavidRoy commented 2 years ago

configuration to the page

burkmarr commented 2 years ago

is it possible to enable the three date bands to be switched on or off individually, so that I could generate three separate maps for three separate time periods, e.g. to use in a presentation?

@kitenetter I have implemented a feature whereby a user can click on the legend on the map to highlight/lowlight the sets of dots that each element on the legend relates to - effectively meeting the requirement above. The highlight and lowlight styles are configurable. That means, for example, that if you would rather that the non-relevant dots are completely hidden (rather than faded as at present), this can be achieved through the @lowstyle control config parameter for the page. Likewise the highlight options currently change the colour to black but this could be changed through the @highstyle control config parameter (e.g. set the parameter to nothing to leave the dots with their original colour).

https://burkmar-brc-irecord.pantheonsite.io/species-details2?taxon_meaning_id=71446

A user will probably not find this feature by accident, so it could probably use a brief explanation somewhere on the page (e.g. 'Click on legend to highlight ranges.').

The feature will be useful in other contexts where the atlas map is used (e.g. potentially the BSBI atlas) - so thanks for the idea!

@DavidRoy after speding some time looking into the code, I eventually resorted to a workaround to achieve the custom attribute label override. The configuration is easy enough to handle, but actioning properly would require modification of reporting code deep in Indicia which would require more investigation and learning on my part (and testing since modifying it would potentially affect many parts of Indicia websites). So for now I've done the modification using Javascript after the Indicia code has generted the table. (Actually that's also the method I used to do the duplicate label hiding.) It works well enough, but it's not technically the best solution for these two requirements.

DavidRoy commented 2 years ago

Thanks @burkmarr. I'd be very happy for this to now go live. We can then move onto other visualisation pages? @kitenetter what do you think?

kitenetter commented 2 years ago

Agree it should go live, it's a great enhancement for the site.

I think the configuration for the single element in the key is okay - we could debate different ways of doing this but what we have is a perfectly good starting point. As you say, it will need some text to explain how to use it. Suggest:

One more suggestion (but this shouldn't delay going live): I've already started using these maps for soldierflies on social media, and to do so I'm just screen-shotting the map area. Maybe we could repeat the taxon name above the key, and add an iRecord logo under the key, just so that if the map is picked up as a stand-alone graphic it gives it some context?

burkmarr commented 2 years ago

@kitenetter - I've added an info text block that appears above the map whose text can be set with a @info page config parameter for [hectadmap] and I've set that to text you suggested above on the demo page.

Dealing with variable text in a map legend is problematic because of layout and space considerations. Some taxa have very long names and would not be easily accommodated. Instead I've taken advantage of an existing feature of the atlas map library which is a download feature that enables some text to be added to the foot of the downloaded image. I've added a @download page config param for [hectadmap] which, if set to true, causes a download button to be displayed, enabling the map image to be downloaded without doing a screen-shot. Text is added to the foot of the image which specifies the taxon name as well as where and when the image came from (and repeating the text about rejected records excluded and unverified included).

image

I know that the taxon name is not prominent as it would be if it was in the legend title, but it does at least ensure that the provenance of the image is always attached to it.

Adding an iRecord image logo is also problematic for several reasons and would require some work. Adding one underneath the text added to the foot of the downloaded page is already possible with the atlas map library but it would have to be made configurable because, in theory, this form could be used on other Indicia sites, not just iRecord. If it's okay with you I think that we should leave this for now so that I can press on with getting the code reviewed by @johnvanbreda in some pull requests so that we can proceed to publishing on the live site.

burkmarr commented 2 years ago

@kitenetter - I should also mention another limitation of the image download - it doesn't reflect the interactive highlighting. So if you wanted an image showing highlighted dots, you'd still have to screen grab.

kitenetter commented 2 years ago

Thanks @burkmarr - the text block that you've added provides clear metadata so I'm happy with that.

burkmarr commented 2 years ago

@johnvanbreda - what is the best way to filter ES to all records of one taxon? I had thought that taxon.taxon_meaning_id would do it, but when I looked at records for Polygonia c-album I saw several values for this field. So I looked at taxon.accepted_taxon_id and saw that even that could have more than one value (NHMSYS0000503893 and eBMS446).

I settled on filtering on taxon.accepted_name. I think that this should normally be fine, but I found a problem with Aglais io - when this is indicated in the URL for the new taxon page (https://irecord.org.uk/species-details?taxon_meaning_id=127683) the XML report returning all the taxon names from cache_taxa_taxon_lists indicates that Inachis io is the preferred taxon name rather than Aglais io so when I then filter ES on that name, I get no records.

johnvanbreda commented 2 years ago

The UK based data should all map to UKSI via the search_code (the name TVK) and the external_key (the accepted name TVK) so that's normally the best way (or even the recently added Organism Key which is more stable if the accepted name changes). The problem here is that you are trying to view all records from both the UK and the European sections of the system and the keys don't match - they have to this point really been 2 completely separate parts of the system. That's why you are finding 2 different accepted_taxon_id values.

Taxon meaning ID provides a theoretical way to join taxa across lists/taxonomies, but it is not likely to give the correct result in practive.

Accepted name will only work if the accepted name doesn't change, as you've discovered. I remember Charles Hussey (Chris Raper's predecessor on UKSI) speaking at a conference about the horrendously low number of taxon names that have never been mis-published or disputed in some way so can be guaranteed to be accurate - this was about scientific names, not just vernaculars.

We don't actually have a good solution to this issue. We could perhaps add a field for capturing the GBIF taxon ID, or the Catalogue of Life ID, to map to some global single taxonomic standard. But this would need to be done manually as UKSI doesn't contain these values and I doubt that all the names will just match up.

Is there a requirement to include EBMS data on the iRecord species pages and vice-versa?

kitenetter commented 2 years ago

@johnvanbreda do we have organism keys for all UKSI taxa now?

If so, either that or external_key should work for UKSI names I think.

I'm not aware of any requirement to include EBMS data, unless @DavidRoy knows otherwise.

johnvanbreda commented 2 years ago

do we have organism keys for all UKSI taxa now?

We do (unless any manually added ones skipped the field), but it's not yet used in the Elasticsearch data.

burkmarr commented 2 years ago

Okay so for this issue - which only needs to map/chart UK data - it looks like I should use the taxon.accepted_taxon_id field to filter Indicia data for now. I will leave it for a day in case anyone else chips in here. But if that still looks like the consensus, II'll aim to apply a hotfix to update the species details page tomorrow.

johnvanbreda commented 2 years ago

I'd definitely use taxon.accepted_taxon_id at this point if you don't need to include the EBMS data.

DavidRoy commented 2 years ago

Agreed. Only websites that share with iRecord for reporting are in scope for this, i.e. not EBMS currently

kitenetter commented 2 years ago

@burkmarr I see that the new maps are now available on the live site, which is excellent. We ought to make an announcement about this is there still any work left to do or is it ready to be announced?

Two loose ends:

  1. Asian Hornet The page for Asian Hornet shows no records, which I think is correct, as the records should be confidential. However, it does show a lot of photos of misidentified Asian Hornets. If the photos are from records that are currently identified as Asian Hornet then I think they should be excluded on grounds of confidentiality; but if the records have been redetermined or rejected than they should be excluded on those grounds.

  2. Navigation to the new maps There is currently no obvious way for people to find the new maps - you have to go to an individual species record page and then click on the button for the details page. I think we need to review this, and perhaps link to the new maps from the "SPecies maps" menu item, which currently goes here, or else enable a way of going straight from a species name on the Explore pages to the details page.

I suggest we set up a new issue for this latter point - I will do that shortly.

kitenetter commented 2 years ago

Actually I am getting more confused about the Asian Hornet records. We seem to be using two taxon_meaning_id values:

If I go to the record details page for 129053, records for 134903 are included: https://irecord.org.uk/all-records?filter-date_age=&filter-taxon_meaning_list=129053

If I try to go to a record details page for 134903 I get an Elastic Search error and the page doesn't load.

On the new species summary pages, there are separate maps for 129053 and 134903: https://irecord.org.uk/species-details?taxon_meaning_id=129053 - shows no records of the NNSS version of Asian Hornet, but does have photos

https://irecord.org.uk/species-details?taxon_meaning_id=134903 - shows records and photos of the UKSI version of Asian Hornet

Is it correct that the new summary maps are treating the different taxon_meaning_id values as separate entities?

I'm uncertain as to how the record details map is handling the two different taxon_meaning_id values, and whether the summary maps should also combine the different values in the same way.

[Steph is looking in to whether the English and Northern Irish records (for 134903) should be flagged as confidential. We don't think that the photos should be being displayed.]

burkmarr commented 2 years ago

@kitenetter - re https://github.com/BiologicalRecordsCentre/iRecord/issues/1286#issuecomment-1246589947- I should fix the problem illustrated with the Peacock butterfly (see https://github.com/BiologicalRecordsCentre/iRecord/issues/1286#issuecomment-1243679077) before we announce it. I hope to do that today but am currently working on a potentially urgent problem with another project. I agree with setting up an new issue for the navigation. I don't know what will be involved in the image problem, but hopefully only an adjustment to the report which gets the images. If you want that to be fixed before we announce the updated page, then I suggest you leave that as part of this issue, but if it can wait, that can also go in a separte issue.

burkmarr commented 2 years ago

@kitenetter - re https://github.com/BiologicalRecordsCentre/iRecord/issues/1286#issuecomment-1246614281- this could be related to the problem discussed here - https://github.com/BiologicalRecordsCentre/iRecord/issues/1286#issuecomment-1243679077. I'll check it out in more detail when I address that.

kitenetter commented 2 years ago

I won't announce anything just yet! We do need to clarify the handling of Asian Hornet before we announce anything, but that can wait until you have time to look into it.

johnvanbreda commented 2 years ago

For me, the Explore pages filtered for the 2 different taxon meaning IDs for Asian Hornet both work OK and both produce 411 records, so the filter is picking up both as I'd hope: https://irecord.org.uk/all-records?filter-date_age=&filter-taxon_meaning_list=129053 https://irecord.org.uk/all-records?filter-date_age=&filter-taxon_meaning_list=134903

The reason this works is both the NNSS and UKSI versions of Asian Hornet both have the same external key, which will get looked up using the taxon meaning ID and used for the filter. In fact there is a bit of extra functionality in the proxy code which converts a filter into Elasticsearch syntax. This makes it tolerate species in out of date lists which have the external key (i.e. the accepted name) now pointing to a synonym - see https://github.com/Indicia-Team/client_helpers/blob/master/ElasticsearchProxyHelper.php#L1249.

The species summary pages now seem to be filtering on accepted name for the Elasticsearch requests which drive the maps - this doesn't work for the NNSS species as they are just lists of common names.

burkmarr commented 2 years ago

@kitenetter - I will switch the ES filter, as outlined above, to external key. That means that the same map will be returned whichever taxon_meaning is used - so long as they link to the same external key (as they do for Asian Hornet).

I've just looked at the Asian Hornet records in ES using Kibana and I'm a little confused about confidentiality. All the Asian Hornet records (regardless of their taxon_meaning_id) are marked as confidential. Currently I have ignored this flag when constructing the hectad map on this species details page. However the hectad map is not constructed from all the records. This is because many of the records (from both taxon_meaning_ids) appear to have their location blurred (to a 100 km square) and cannot therefore be used to build the hectad map.

kitenetter commented 2 years ago

@burkmarr my understanding is that all incoming Asian Hornet records are being flagged as confidential, but that there are some older ones that have not yet had a retrospective flag applied. I believe Steph is working with John to get them all flagged as confidential.

Confidential records should be excluded from any mapping.

burkmarr commented 2 years ago

@kitenetter - so the map for Asian Hornet will be blank. I guess I should add some text to the species details page that says that all confiential records are excluded from the map. I presume it's okay to use confidential records for the construction of the two temporal charts?

johnvanbreda commented 2 years ago

@burkmarr what Elasticsearch alias are you using to access the records? Normally the confidential filter should be enforced so you simply won't see them.

kitenetter commented 2 years ago

Asian Hornet is a special case due to its sensitivity, and to the fact that nearly all records are misidentifications. So for that one, and more generally, I'd be inclined to exclude all confidential records from any of the visualisations. AFAIK there aren't many confidential records in the system other than Asian Hornet.

kitenetter commented 2 years ago

I had a feeling that confidential records were being excluded from ES, sorry, I should have checked that first. In which case the only problem we have with the Asian Hornet records is that some of the older ones are not yet flagged as confidential (which I think Steph has already asked John about).

So there is probably no further action needed for the visualisations.

burkmarr commented 2 years ago

@johnvanbreda - thanks I hadn't realised that. The page configuration points to the es-occurrences endpoint.

johnvanbreda commented 2 years ago

That should automatically apply confidential filter for the reporting sharing mode then.

burkmarr commented 2 years ago

@kitenetter - records for the maps, charts and images are now retrieved based on the taxon external_key rather than the taxon_meaning_id. Only images associated with verified records that are not marked as confidential are retrieved.

kitenetter commented 1 year ago

Just need to make the maps easier to find, see #1378 - closing this issue.