NERC-CEH / irecord-butterflies-app

Repository for the code for the iRecord Butterflies App
0 stars 0 forks source link

Discrepancies with species reports #87

Open RichardFoxBC opened 3 years ago

RichardFoxBC commented 3 years ago

Looking at my "All years" species list in the app shows several things that don't correspond to what I see when I explore my records on the iRecord website. For example:

Silver-spotted Skipper - one record listed in app report, but none on iRecord website Chequered Skipper - four records listed in app report, but none on iRecord website Dingy Skipper - appears in the app list twice. This will be because of iRecord logging some Dingy Skipper records at subspecies level even though they were not identified to that level by me. Swallowtail - six records listed in app report, but only four on iRecord website

Given that these oddities are appearing in the All years list then they must also appear somewhere in the individual year lists. Would be good to get to the bottom of it before the update is launched.

kazlauskis commented 3 years ago

I can't test this myself as I don't have the necessary access to your data. @johnvanbreda could you have a look at this?

johnvanbreda commented 3 years ago

Pretty sure this was down to training data being included in the new reports. @RichardFoxBC I've filtered training data out now so let me know if this is still a problem.

RichardFoxBC commented 3 years ago

Many thanks for this @johnvanbreda - yes that has resolved most of those discrepancies. The only remaining issue is for Dingy Skipper, which seems to be a problem because some of my records are stored as Erynnis tages and others as Erynnis tages tages If possible the report should merge any subspecies records together with any species level records, so that there is only a single entry for Dingy Skipper at species level. I don't know whether there are any other issues with subspecies but the iRecord Butterflies app should be working at species level so the report should be the same please.

johnvanbreda commented 3 years ago

@DavidRoy this issue requires us to aggregate to the species taxon, even for records of child taxa. In order to generate the output we require here, we need to add at least the species vernacular name to the Elasticsearch index (plus probably the accepted species name author). We already have the accepted species name in the index for records of sub-species but we need the vernacular as well (as the species vernacular and sub-species vernacular are not always the same).

Is it OK to spend the few hours required to add this to the index?

DavidRoy commented 3 years ago

Yes please, go ahead with this

AnthonyMcCluskey commented 3 years ago

Hi all, not sure if this was being extended to other species, but I'd definitely recommend it for Large Heath, as many records for this species will be of the three sub-species (davus, polydama, scotica), and the same issues would result.

RichardFoxBC commented 3 years ago

Guessing that this hasn't been resolved yet? My All Years list in the current beta is still showing two entries for Dingy Skipper

DavidRoy commented 3 years ago

@RichardFoxBC are you on version 2.1.0 (77) now?

DavidRoy commented 3 years ago

@johnvanbreda can you check the report for user = https://www.brc.ac.uk/irecord/user/7373/edit

RichardFoxBC commented 3 years ago

Yes @DavidRoy I'm on (77). My all years species report shows Dingy Skipper 17 records 53 count and Dingy Skipper 4 records 23 count.

DavidRoy commented 3 years ago

One for @johnvanbreda to check the report is handling the synonymy correctly

johnvanbreda commented 3 years ago

I'm waiting for an update to the Elasticsearch taxonomy data - updating 20m+ records takes some time. 12m done so far.

Once this is done I will be able to modify the report to return species, with sub-specific records being treated as if they were a species record.

RichardFoxBC commented 3 years ago

Many thanks @johnvanbreda I think we should wait for this to be implemented before we release the app update, assuming it won't take too long. No point us issuing an update with the new statistics functionality if it isn't reporting correctly.

DavidRoy commented 3 years ago

@RichardFoxBC it will only affect a proportion of recorders who have submitted Dingy skipper records? I think the benefits of the new updates to the likelihoods will outweigh the statistics problem. I suspect many won't find the new stats functionality until we advertise it?

johnvanbreda commented 3 years ago

@karolis I've now updated the Elasticsearch index and the report API so there is an option to retrieve only species level taxa. In the documentation I sent you for the recorded-taxa-list end-point, there was a parameter:

exclude_higher_taxa - set to 't' to exclude ranks above species.

This parameter is now deprecated, but can be replaced by:

species_only - set to 't' to exclude ranks above species and report taxa below species using their species rank name (i.e. a sub-species and species will be reported as the same thing). When this option is set, the response includes a field "species" instead of "accepted_name" and "species_taxon_id" instead of "accepted_taxon_id".

DavidRoy commented 3 years ago

@RichardFoxBC is this issue resolved for you? Close if ok. If not, please label for Milestone 3