NERC-CEH / npms-app

📱NPMS mobile application
http://www.npms.org.uk/
GNU General Public License v3.0
0 stars 0 forks source link

Notes on the logic and data structures for NPMS Website, Plant Portal NPMS+, and Plant Portal Standard Mode #78

Open andrewvanbreda opened 4 months ago

andrewvanbreda commented 4 months ago

Squares:Plots:Plot Groups Structure Notes.txt

andrewvanbreda commented 4 months ago

Data Entry Pages - Extra Logic Notes.txt

andrewvanbreda commented 4 months ago

@kazlauskis @sacrevert I will send the notes on the report we spoke about once I have reviewed what I will do with that.

Note I have included all the custom logic in the above notes. @sacrevert it might be worth you also reviewing these, as I noted a few unexpected things during the review, such as Plant Portal NPMS+ forms not including all the functionality the NPMS website has. I can't remember if that was deliberate.

andrewvanbreda commented 4 months ago

Hi @kazlauskis ,

Just looking into what I need to do for that report for you.

I think it will be multiple reports, as otherwise it will be very difficult to test, but I will confirm.

I have some questions though...

  1. For NPMS, am I correct in thinking are wanting the list of plots that can actually be used by a person on the data entry screen only? (The is different to the list of ones they can view on the squares/plots screen where they create plots).

  2. For Plant Portal, a plot can have multiple plot groups, therefore a plot row in the report would need to allow for several plot groups. I could put them into a comma separated list in a cell if that is ok?

kazlauskis commented 4 months ago

@andrewvanbreda

  1. Yes, the app user will only care for plots they can survey.
  2. An array of plot groups sounds good.
andrewvanbreda commented 4 months ago

Hi @kazlauskis ,

Here are the notes on the reports I will write for you. Let me know if you have any questions or suggestion, particularly the Plot Groups bit is a little trickier than the rest.

3 reports. The report columns are listed below. The name inside the brackets are the SQL column names.

NPMS Website

Plot ID (plot_id) - ID from Indicia Locations table, NOT NULL Plot Name (plot_name) - Plain text (composed of several elements such as spatial reference, plot type) NOT NULL Square ID (square_id) - ID from Indicia Locations table, NOT NULL Square Name (square_name) - Plain text (square's spatial reference), NOT NULL Private plot (private_plot) - Indicates if the private flag is set on a plot and therefore a 10km privacy precision is to be used on the sample. Displays 1 as true, 0 as false. NOT NULL Warehouse User ID (warehouse_user_id) - ID from Indicia Users table, this will appear the same on each report row as only one user ID is passed in as parameter NOT NULL

Plant Portal - NPMS+

Plot ID (plot_id) - ID from Indicia Locations table, NOT NULL Plot Name (plot_name) - Plain text (composed of several elements such as spatial reference, plot type) NOT NULL Square ID (square_id) - ID from Indicia Locations table, NOT NULL Square Name (square_name) - Plain text (square's spatial reference), NOT NULL Project ID (project_id) - Also known as activity or recording group on other projects, ID from Indicia Groups database table, NOT NULL Project Name (project_name) - Title from Indicia database Groups table, NOT NULL Warehouse User ID (warehouse_user_id) - ID from Indicia Users table, this will appear the same on each report row NOT NULL

Plant Portal - Standard Mode

Plot ID (plot_id) - ID from Indicia Locations table, NOT NULL Plot Name (plot_name) - Plain text (name field from Indicia locations table), NOT NULL Plot Group IDs for Plot (plot_group_ids_for_plot) - Comma separated list of Indicia termlist_term IDs of Plot Group IDs associated with the plot, CAN BE NULL Plot Group Names for Plot (plot_group_names_for_plot) - Comma separated list of term names of Plot Groups associated with the plot, returned in the same order as the plot_group_ids_for_plot field above, CAN BE NULL Project ID (project_id) - ID from Indicia Groups database table, NOT NULL Project Name (project_name) - Title from Indicia database Groups table, NOT NULL Warehouse User ID (warehouse_user_id) - ID from Indicia Users table, this will appear the same on each report row, NOT NULL Plot Group IDs for User (plot_group_ids_for_user) - Comma separated list of termlist_term IDs of Plot Group IDs associated with the user for that project, this will appear the same on each report row, CAN BE NULL Plot Group Names for User (plot_group_names_for_user) - Comma separated term names of Plot Groups associated with the user/project user for that project, returned in the same order as the plot_group_ids_for_user field above, this will appear the same on each report row, CAN BE NULL

All reports take an Indicia User ID as parameter. The 2 Plant Portal reports will also take a project ID as a parameter which must be supplied.

The 2 NPMS reports assume you will be filtering based on the square column as the user selects different squares.

The filtering on the PP Standard Mode data entry is different. In that case, all plots are displayed for the project/user combination, then the user optionally filters the plot list using a selected plot group. What you can do is display the plot groups in the plot_group_names_for_user cell described above. That will give you all plot groups for the user for that project. You can then filter the plots dynamically as a user selects a plot group from the list. If the selected plot group id appears in the plot_group_ids_for_plot field in the report, then we know that the plot group is associated with that plot, and the plot can continue to be displayed, otherwise hide the plot until the user makes a different plot group selection. (same can done using the names report fields if you prefer to use that instead of IDs)

I guess it is possible some details here may change if I notice something during report development (such as if I made mistakes above), but I will try to keep it as close to this as possible. Feel free to suggest changes as I can't remember everything we said in meeting. If you want, I can make the columns exactly the same between the reports and just leave unused columns blank if you prefer.

@sacrevert: Note I am make the assumption that Tracking The Impact is not part of this new app.

kazlauskis commented 4 months ago

The 2 Plant Portal reports will also take a project ID as a parameter which must be supplied.

Can we make the project ID optional or not required in the report? I would like to get all the users' plots without fetching the projects first.

andrewvanbreda commented 4 months ago

@kazlauskis Yes can do that instead.

kazlauskis commented 4 months ago

Excellent, thanks!

andrewvanbreda commented 4 months ago

Hi @kazlauskis @sacrevert,

I have coded 3 reports.

Two of them currently still have bugs in them though, so are still in testing.

However the NPMS Wesbite one is ready.

It currently resides on a page on a clone of the live site (using a Pantheon multidev site).

https://avb-report-brc-npms-d10.pantheonsite.io/content/karolis-npms-website-report

The report it is pointing to is held in Github.

indicia-reports/reports/projects/npms/app_get_npms_website_plots_for_user

Be aware that IDs (such as attribute IDs, website IDs) in the report are hard-coded to match the IDs on the live warehouse (so that you don't need to supply these as parameters). If you prefer to supply them as parameters so the report can be used on other Warehouses, then let me know.

Will let you know when other 2 are ready

Andy

kazlauskis commented 3 months ago

@andrewvanbreda can I fetch the reports using the warehouse directly? like we do it for the FIT Count app:

https://warehouse1.indicia.org.uk/index.php/services/rest/reports/projects/PoMS/get_projects_for_country.xml?country_for_project_tt_attribute_id=8&url_for_project_tt_attribute_id=10&countries_termlist_id=974&projects_termlist_id=1050&country_code=UK

andrewvanbreda commented 3 months ago

Hi @kazlauskis,

The report is on the Warehouse, so I can only assume the report can be called in a similar way as the example you have just given. I think the equivalent path would be like this

https://warehouse1.indicia.org.uk/index.php/services/rest/reports/projects/npms/app_get_npms_website_plots_for_user.xml?warehouse_user_id=5553

The call above should return the 3 plots that I am linked to. The Warehouse User Id can obviously be adjusted as needed.

kazlauskis commented 3 months ago

Thanks, I will give it a try next week.

andrewvanbreda commented 3 months ago

Hi @kazlauskis The NPMS Plus report is ready now. Should be able to be called like this example

https://warehouse1.indicia.org.uk/index.php/services/rest/reports/projects/plant_portal/app_get_npms_plus_plots_for_user.xml?warehouse_user_id=84948&project_id=2121

That example should return 4 plots.

Either parameter can technically be left blank, however it is really designed for you to either use both parameters, or to leave the project_id blank (leaving the user ID blank would lead to results including all combinations of user assignments, which probably isn't very useful)

kazlauskis commented 3 months ago

@andrewvanbreda I have tried the plant portal report without project_id, but this gives me 400.

andrewvanbreda commented 3 months ago

Hi @kazlauskis If you supply user 84948 and no project ID, it should return 33 rows. I am seeing that, which is all the plots I have access to across all NPMS+ projects. The report page is here if that helps understand what is going on https://avb-report-brc-plantportal-d10.pantheonsite.io/karolis-npms-mode-report

kazlauskis commented 3 months ago

I am doing a GET request to https://warehouse1.indicia.org.uk/index.php/services/rest/reports/projects/plant_portal/app_get_npms_plus_plots_for_user.xml?warehouse_user_id=5553

which returns:

{
    "code": 400,
    "status": "Bad request (parameters missing)",
    "message": "Missing parameters: project_id"
}
andrewvanbreda commented 3 months ago

Hi @kazlauskis Oh I thought you meant 400 rows :D

What happens if you include the project_id parameter, but leave it with an empty value, so something like https://warehouse1.indicia.org.uk/index.php/services/rest/reports/projects/plant_portal/app_get_npms_plus_plots_for_user.xml?warehouse_user_id=5553&project_id=

on plant portal my main user is ID 84948 by the way.

kazlauskis commented 3 months ago

Haha, yeah, I forget people can't read my mind.

What happens if you include the project_id parameter, but leave it with an empty value, so something like

The request works fine now, thanks. I will use the report like this then, as I would like to fetch all the users' plots and store them in the app for offline use.

andrewvanbreda commented 3 months ago

Hi @kazlauskis I am just waiting for the Standard Mode report to go live before I give it a final test. Just to pre-warn you, I have had to change the output of the report a bit as some things are a pain to implement that I hadn't previously realised.

I previously suggested that there would be 4 columns related to the plot groups. However these have been merged down to just 2, as I will need to put the ID and terms together. I hope this is ok.

It is still comma separated but each plot group is output as a termlist_term ID followed by the plot group name pair. I used a triple vertical bar pipe symbol ||| to separate the ID/term pair (under the assumption no user would actually use that character combination in an actual plot group name in practice)

e.g. an example output would be like this 123|||Plot group name one,124|||Plot group name two,125|||Plot group name three

In this case the plot group (termlist_term) ids are 123 124 125

and the plot group names are Plot group name one Plot group name two Plot group name three

The columns no longer present are plot_group_ids_for_plot plot_group_names_for_plot plot_group_ids_for_user plot_group_names_for_user

The new columns are plot_group_ids_and_names_for_plot plot_group_ids_and_names_for_user

These both work as described above.

Let me know if you need me to alter this in any way or have any questions.

kazlauskis commented 3 months ago

@andrewvanbreda, let me know when this is ready, and I will give it a test.

andrewvanbreda commented 3 months ago

Hi @kazlauskis I think this is working now

https://warehouse1.indicia.org.uk/index.php/services/rest/reports/projects/plant_portal/app_get_npms_plus_plots_for_user.xml?warehouse_user_id=&project_id=

Fill in the project_id and warehouse_user_id as you require.

I obviously not had much time to test it, but looks like it is working on initial testing (AVB Edit: I have now tested this further and it does indeed seem to be working as I intended) I have not added the attributes yet as I just wanted to get this out, so that would be released separately.

It is also available on page here https://avb-report-brc-plantportal-d10.pantheonsite.io/karolis-standard-mode-report

andrewvanbreda commented 2 months ago

Hi @BirenRathod,

Whenever you get a sec, would you be able to pull the Plant Portal reports folder please.

Cheers

andrewvanbreda commented 2 months ago

Hi Karolis (and @sacrevert for reference)

The NPMS website report with the column renaming and extra attributes is now ready to use.

A few notes

  1. I did not remove the Warehouse User ID field as this may be useful for debugging in future if there are any problems with the report as NPMS can return plots the user didn't create sometimes, and this column would be good for debugging.

  2. The following fields I left in the report, but don't think are used Code External Key Public (I think this will always be 'f', as it would only be true if the plot location was available for use on any website)

  3. The two geom fields are provided as raw from the database. I think you probably need to always show the shape from the both of these if they are available. This is because some plots types are a drawn shape, but also a square point somewhere inside this. I am afraid my warehouse is causing me trouble again at the second, so I will try to confirm this for you.

  4. I included an extra column with the name "npms_formatted_name". This is the plot name label the NPMS website attempts to display on its grids, it is made of several fields and text. You can see the logic in the report definition and this column can be used for reference if you want to display similar. I can remove column if you don't want it.

  5. One of the columns "expert_mode" is not raw from the database and displays TRUE or FALSE, I added it to the report before your request to sync the columns up with the Warehouse names last week. Let me know if you prefer this to be a 1 and 0 (the private plot column is a 1 and 0)

Report is here in Github. https://github.com/BiologicalRecordsCentre/indicia-reports/blob/master/reports/projects/npms/app_get_npms_website_plots_for_user_2.xml

It works in the same way as the report that didn't have the extra attributes. As you can see I called it app_get_npms_website_plots_for_user_2 instead. I did not want to change the old report in case it interfered with your work. If you let me know when you are using the new report ok, I will delete the old report (or alternatively if you prefer we can overwrite the old report with the new version, and get rid of the new file).

I will let you know about the Plant Portal reports, they are done, but not tested, as thet have only just been pulled onto the live Warehouse today.

Andy

BirenRathod commented 2 months ago

Hi @BirenRathod,

Whenever you get a sec, would you be able to pull the Plant Portal reports folder please.

Cheers

@andrewvanbreda This has been updated.

andrewvanbreda commented 2 months ago

Hi @kazlauskis The NPMS+ enhanced report is now ready. app_get_npms_plus_plots_for_user_2.xml Similar notes as above for the website report. The only extra thing to note in this case is that the Plant Portal plot names are a bit different to NPMS website, we simply just use what is in the location.name database field ("name" in the report). The report does currently return an extra my_plot_label field (like it does for the NPMS website report), but I have removed it for the Plant Portal report now . This will no longer be present next time someone pulls the report to the Warehouse.

Report definition here: https://github.com/BiologicalRecordsCentre/indicia-reports/blob/master/reports/projects/plant_portal/app_get_npms_plus_plots_for_user_2.xml

andrewvanbreda commented 2 months ago

Hi @BirenRathod Would you be able to pull the Plant Portal reports folder again when you get a sec please.... thanks.

@kazlauskis Once Biren has done that, this report will be available for you to use.

https://github.com/BiologicalRecordsCentre/indicia-reports/blob/master/reports/projects/plant_portal/app_get_standard_mode_plots_for_user_2.xml

That is all reports done now.

Note I did remove the Warehouse User ID column, I made a mistake thinking we needed that and has been removed from all reports.

BirenRathod commented 2 months ago

@andrewvanbreda Those reports have gone live now.

andrewvanbreda commented 2 months ago

Hi @karolis,

I said I would let you know about what logic is needed for centroid_geom, boundary_geom. The answer is simple. Show both as shapes on the map where available. Centroid_geom will always be present as a spatial reference marker. In some plot draw modes this could be the entire plot, such as OSGB has square spatial references, so can represent a whole plot in some cases (in which case boundary_geom may be empty). boundary_geom is also filled in......also show centroid_geom in this case because it can show a particular point in the plot, such as its centre. In summary, show both unless null

kazlauskis commented 1 month ago

@andrewvanbreda I am looking at the geometries attached. Is this a raw binary? Can we convert it to WKT?

 "centroid_geom": "010300002031BF0D0001000000050000006E26AE2EEE2025C19A0645174E9C5B416E26AE2EEE2025C1D2A755CE4F9C5B4143EE3B7DE02025C1D2A755CE4F9C5B4143EE3B7DE02025C19A0645174E9C5B416E26AE2EEE2025C19A0645174E9C5B41",
andrewvanbreda commented 1 month ago

@kazlauskis Have used a function I saw on another report to do that conversion. I don't think this will go live today because BRC is shut (I think) because of bank holiday here in UK.

@BirenRathod Whenever you get a chance could you pull the Plant Portal/NPMS report folders.

BirenRathod commented 1 month ago

@andrewvanbreda All files of NPMS & Plant Portal have been updated.

andrewvanbreda commented 1 month ago

Hi @kazlauskis @sacrevert As requested in our meeting, both plot group columns in the Standard Mode report now use JSON

andrewvanbreda commented 1 month ago

Hi @kazlauskis @sacrevert,

I have a version of the report that you can use to download species lists for development.

Note, this report will NOT give you the final species list, but it is very close, good enough for development.

The reason I say this is there are a few inconsistencies between my report and the website data entry list. This is to be expected, it was very unlikely that my report would return identical entries first time.

I will contact you further about this. This affects about 5 entries from what I can tell. There are some other improvements to the report I have coded, but are not live yet I will message you about when they go live (e.g. such has having a parameter which allows you to choose which websites the frequency count comes from).

The report can be called as usual from this path

projects/plant_portal/app_get_species_lists_2

provide a species_list parameter which currently support 2 options species_list =wildflower species_list=indicator

The report will then show you a close match for what is on the website in those modes, apart from my report includes frequency ordering.

For wildflower I think the call you need is this

https://warehouse1.indicia.org.uk/index.php/services/rest/reports/projects/plant_portal/app_get_species_lists_2.xml?species_list=wildflower

for indicator

https://warehouse1.indicia.org.uk/index.php/services/rest/reports/projects/plant_portal/app_get_species_lists_2.xml?species_list=indicator

Note that the only difference in the report between wildflower and indicator mode is that the indicator includes the synonyms in a comma separated column, the Wildflower data entry page is missing entries these on the website as it is just one long list, on Indicator the user is free to use a synonym. The app may not have this limitation, so in that case you may wish to include the synonyms on Wildflower too, depending on what Oli says I guess

sacrevert commented 1 month ago

@andrewvanbreda is it possible for my warehouse account to be given permission to view these? Currently I get a 401 error even when logged in (warehouse username Oliver_Pescott)

sacrevert commented 1 month ago

I don't have an issue with the Wildflower list including synonyms in the app; if it's a dropdown, rather than a fixed grid as on the website, then we might as well.

andrewvanbreda commented 1 month ago

Hi @sacrevert,

Reports are available on pages on some of my Pantheon Multidev sites

For Plant Portal, the 3 reports I wrote for Karolis are here,

https://avb-report-brc-plantportal-d10.pantheonsite.io/karolis-species-list-report

https://avb-report-brc-plantportal-d10.pantheonsite.io/karolis-standard-mode-report

https://avb-report-brc-plantportal-d10.pantheonsite.io/karolis-npms-mode-report

You will need to be an admin to see these, I have made the following 2 accounts admin accounts

OliverTest2 OliverNormal

For NPMS the report is here,

https://avb-report-brc-npms-d10.pantheonsite.io/content/karolis-report

Use the "olipescott" admin account to see it.

The first report in the list above returns the species list depending on the parameter "species_list" given as wildflower or indicator

The other 3 reports show the plots for a user in Standard Mode, NPMS+ and NPMS website.

Those last 3 reports take parameters as IDs. For Plant Portal, to find out a Project ID go to the All Projects page, then "Edit Project Details", the project id will be in the "group_id" parameter in the URL.

A user ID can be found at the bottom of the My Account page Edit Tab in the "Indicia Warehouse User ID" field (which is not visible on the real websites)

andrewvanbreda commented 1 month ago

@sacrevert I just realised a download might be useful to you, I will put that link on the reports now in the bottom right. This will be available in a few minutes time.

andrewvanbreda commented 1 month ago

@sacrevert For reference, the differences I am noting with my report are as follows (comparison made with Plant Portal Wildflower form).

Chenopodium album -> Fat-hen common name is missing on the data entry form Conifer seedlings/saplings->My report is showing name twice, in both preferred and common name columns Dioscorea communis->Black Bryony common name missing on data entry form Dipsacus fullonum (Wild Teasel)->shows as "Dipsacus fullonum sensu lato" (no common name) on data entry form. Jacobaea vulgaris (Common Ragwort) - missing from data entry form Ophioglossum vulgatum (Adder's-tongue) - shown as Ophioglossum vulgatum sens.lat. on data entry form Potentilla anserina -> missing Silverweed common name on data entry form Sanguisorba minor (Salad Burnet) -> missing from my report Senecio jacobaea (Common Ragwort) -> missing from my report

sacrevert commented 1 month ago

@andrewvanbreda I guess you are looking into these, although they look like the kinds of errors that I would have expected to be solved with the move to using organism keys rather than TVKs?

andrewvanbreda commented 1 month ago

@sacrevert Yes am about to look into them. Some of them are a bit odd, almost as if the report is doing better than the data entry page. I will let you know if I need input, I will also let you know the result of my findings. Again, this is a another time where not having access to the warehouse db and only the Warehouse UI is a bit of a pain.

andrewvanbreda commented 1 month ago

Hi @sacrevert @kazlauskis I just realised after your comment Oli, that actually the report is fine. I realised I was doing the comparison wrong because it was on my multidev site, but that site is actually older than the organism_key deploy, so it was using old version of form. Redoing the comparison, the report looks exactly right apart from it currently shows "Conifer seedlings/saplings" in both preferred and common name columns, I will remove that duplication. There are a couple of other minor enhancements to come such as being about to choose the which sites the frequency counts come from, but the main list looks about right minus that one problem. The next version also has modes from the Inventory selection, and also the Plant Portal Additional species grid.

andrewvanbreda commented 1 month ago

@sacrevert @kazlauskis The report now supports an inventory mode using the parameter species_list =inventory The same way you would use the other modes as described above. Due to the number of taxa this allows, this is not as well tested as the other modes, but this should be OK apart from the 2 issues am noting below.

1. @sacrevert: I notice on the website the Wildflower form just returns the default common name (English). My report does this for all 3 modes. However I am now noticing on the website if you type in a Welsh name directly (possibly other languages too) it does support these names too. Is this intentional? i.e. Should the website be changed to remove these other common names? (noting that Widlfower does not have these other langauges) or Should my report be changed to include the names in other languages? or does it not matter if my report and the website are different in this way?

2. A smaller problem I am noting is the current list of synonyms can include a duplicate of the preferred taxon name omn my report. I am not sure why this happens, I don't think actually it is report logic issue, and there IS actually likely a duplicate name. I will add a fix to avoid this, but that fix is not present at the time of writing.

One last note, the report can now take a parameter to determine the websites you want the frequency count to come from. If the param is not present, then it defaults to Plant Portal only (website 106) The reason is the count can be slow, particularly in Inventory. To use this param, supply a comma separated list of website IDS.

Like this count_website_list=106,32

In this example we count from both Plant Portal (106) and NPMS (32)

In my experience that is ok to do this for Wildflower and Indicator but gets very slow on Inventory which is probably best to leave as 106 onyl for the count. iRecord is website 23, but I found that too slow to do any counts with.

andrewvanbreda commented 1 month ago

@sacrevert @kazlauskis Actually am just realising there may be another limitation of this report in its current form. Currently the common names and synonyms do not have their taxa_taxon_list IDs included. I think this need including otherwise the recorded name is not saved to the database, the preferred name is going to always be the recorded name. I do not know what the current app does?

I should probably change the common name and synonym columns to be ID and name JSON encoded.

sacrevert commented 1 month ago

@sacrevert: I notice on the website the Wildflower form just returns the default common name (English). My report does this for all 3 modes. However I am now noticing on the website if you type in a Welsh name directly (possibly other languages too) it does support these names too. Is this intentional? i.e. Should the website be changed to remove these other common names? (noting that Widlfower does not have these other langauges) or Should my report be changed to include the names in other languages? or does it not matter if my report and the website are different in this way?

A smaller problem I am noting is the current list of synonyms can include a duplicate of the preferred taxon name omn my report. I am not sure why this happens, I don't think actually it is report logic issue, and there IS actually likely a duplicate name. I will add a fix to avoid this, but that fix is not present at the time of writing.

I assume that you are talking about NPMS+ Wildflower form in Plant Portal here? If so, I don't see why we would remove this functionality. If the form uses a dropdown (as opposed to a fixed grid like the NPMS website Wildflower form), then we may as well support synonyms.

A smaller problem I am noting is the current list of synonyms can include a duplicate of the preferred taxon name omn my report. I am not sure why this happens, I don't think actually it is report logic issue, and there IS actually likely a duplicate name.

Can you provide an example, so that we can check the UKSI?

One last note, the report can now take a parameter to determine the websites you want the frequency count to come from. If the param is not present, then it defaults to Plant Portal only (website 106) The reason is the count can be slow, particularly in Inventory.

I'm not sure I understand this bit -- we only want to run this frequency count once, so that Karolis has the numbers in the report and can use them in the app. Whilst we might want to update the counts occasionally (e.g. every few years or so), for Karolis we want a pre-generated static column, not one that is generated when the report is called by the app each time.

andrewvanbreda commented 1 month ago

Hi @sacrevert ,

  1. I am referring about the inconsistencies in the name display in the 3 data entry forms in both cases (which are the same for NPMS+/npms website).

On the website, because we have different types of entry (static vs drop-down|), there are differences

Wildflower = Shows preferred names, with English Common from an organism key defined list

Indicator = Shows preferred names, common names from different languages + different synonyms from the organism key list

Inventory = Shows preferred names, common names from different languages + different synonyms the UK Master list limited by taxon group.

Going by your answer, I think the answer is for me to change my reportfor Karolis across all 3 modes to show all synonyms, and all commons names in appropriate languages. I will do this (currently it mimics the website which it sounds like the app does not need to do)

  1. Yes, here are just some examples of 3 that do this. Looking at the Warehouse UI, we can see the synonym box as the same name as the main taxon box. I do not know why it would be like this, perhaps there is a valid reason. Rubus fruticosus agg. https://warehouse1.indicia.org.uk/index.php/taxa_taxon_list/edit/324710

Crataegus monogyna https://warehouse1.indicia.org.uk/index.php/taxa_taxon_list/edit/323194

Ilex aquifolium https://warehouse1.indicia.org.uk/index.php/taxa_taxon_list/edit/323681

3. In terms of the website limiting. You are correct, the frequencies are intended to be a one off, however they do need to actually run. Running an unlimited count will likely never finish and would timeout. (e.g. I left an iRecord record count to to run for Wildflower for quite a while on my report, and that did not finish). Also remember when developing and I am doing multiple test runs, a count that takes ages is no good. Obviously doing a count on too few records may cause a less useful result.

This is the reason for the option. Putting in the website list parameter was a simple addition. As note the param does not even need to be included and in that case the Plant Portal website results only will be counted.

I am not quite sure why counting is so slow, I have always found this.

I will let you know once I have finish making the changes noted.

sacrevert commented 1 month ago
  1. Ok, yes, that makes sense.
  2. These are just standard synonyms as per the UKSI. Confusingly however, there are subtle differences between which form is currently "preferred". For example, for the Rubus aggregate, the form without the authority ("L.") is preferred and the form with is considered ill-formed. For the other examples it is the reverse. I can see why this might be the case, as it was/is probably considered that a standard binomial should have the authority, whereas a recording aggregate like Rubus fruticosus agg. strictly perhaps makes less sense with the authority. Anyway, it is not an error.
  3. Can you provide the SQL for the count? I know that there are hardcoded timeout limits in the Postgres server. Irrespective of this, if we need a website default for the count parameter then NPMS would be better as PlantPortal has very little data (and it is not likely representative in the way required for the frequency attribute)
andrewvanbreda commented 1 month ago

Hi @sacrevert,

OK, I changed the latest version of report to default to NPMS (32). Thanks for the explanation about point 2, I think we just need to go ahead with my previous plan to remove the duplicate from the synonyms column in the SQL.

The old version of the report is this https://github.com/BiologicalRecordsCentre/indicia-reports/blob/master/reports/projects/plant_portal/app_get_species_lists_2.xml

I have attached the log from the warehouse of what it is trying to run when I set to return Widflower list using an NPMS occurrence count

Warehouse log.txt

The next version is this (note this is complete, but not yet tested for bugs at time of writing). It has not been pulled onto the Warehouse yet either. Do not run it, but you can observe the code.

https://github.com/BiologicalRecordsCentre/indicia-reports/blob/master/reports/projects/plant_portal/app_get_species_lists_3.xml

sacrevert commented 1 month ago

@andrewvanbreda I had a problem with some SQL running slowly a few weeks back, to the point of the server timeout kicking in, and ChatGPT helped speed it up immensely. I did the same thing for your Warehouse log.txt file above, and, for a query with a limit of 10,000 rows, it reduced the speed from 34s to 8s. I also ran two checks, one a row count and one a cell-by-cell comparison, and it seems to return exactly the same rows/cells as your query, although no doubt you would want to check it too. I have no idea why this "with" structure is so much faster, but I have found the same for other SQL queries of mine.

WITH cttl_preferred AS (
    SELECT
        id,
        taxon_group_id,
        taxon,
        default_common_name,
        organism_key,
        taxon_meaning_id
    FROM cache_taxa_taxon_lists
    WHERE taxon_list_id = 15
      AND preferred = TRUE
),
cttl_synonym AS (
    select
        id,
        taxon_meaning_id,
        taxon
    FROM cache_taxa_taxon_lists
    WHERE taxon_list_id = 15
      AND preferred = FALSE
      AND language_iso = 'lat'
),
cof_filtered AS (
    SELECT
        id,
        taxa_taxon_list_id
    FROM cache_occurrences_functional
    WHERE website_id IN (32)
),
wildflower_organism_keys AS (
    SELECT unnest(array[
        'NBNORG0000008568', 'NBNORG0000008427', 'NBNORG0000007889', 'NBNORG0000007115',
        'NBNORG0000007180', 'NBNORG0000053047', 'NBNORG0000008215', 'NBNORG0000008214',
        'NBNORG0000007431', 'NBNORG0000008065', 'NBNORG0000053122', 'NBNORG0000008018',
        'NBNORG0000007805', 'NBNORG0000008037', 'NBNORG0000053162', 'NBNORG0000053186',
        'NBNORG0000007254', 'NBNORG0000007087', 'NBNORG0000008547', 'NBNORG0000053255',
        'NBNORG0000008562', 'NBNORG0000007698', 'NBNORG0000116335', 'NBNORG0000053355',
        'NBNORG0000007079', 'NBNORG0000007361', 'NBNORG0000053434', 'NBNORG0000008185',
        'NBNORG0000007424', 'NBNORG0000053279', 'NBNORG0000008441', 'NBNORG0000008442',
        'NBNORG0000053464', 'NBNORG0000007547', 'NBNORG0000007316', 'NBNORG0000053501',
        'NBNORG0000053516', 'NBNORG0000008605', 'NBNORG0000008489', 'NBNORG0000007646',
        'NBNORG0000053590', 'NBNORG0000107933', 'NBNORG0000053643', 'NBNORG0000007959',
        'NBNORG0000007991', 'NBNORG0000008599', 'NBNORG0000008597', 'NBNORG0000008596',
        'NBNORG0000008595', 'NBNORG0000007434', 'NBNORG0000053719', 'NBNORG0000007170',
        'NBNORG0000116336', 'NBNORG0000008144', 'NBNORG0000007511', 'NBNORG0000007919',
        'NBNORG0000008054', 'NBNORG0000008457', 'NBNORG0000053904', 'NBNORG0000008263',
        'NBNORG0000007359', 'NBNORG0000054034', 'NBNORG0000008307', 'NBNORG0000105189',
        'NBNORG0000107960', 'NBNORG0000007967', 'NBNORG0000116337', 'NBNORG0000008665',
        'NBNORG0000007976', 'NBNORG0000008189', 'NBNORG0000008186', 'NBNORG0000008096',
        'NBNORG0000008089', 'NBNORG0000007855', 'NBNORG0000007854', 'NBNORG0000007881',
        'NBNORG0000054424', 'NBNORG0000008468', 'NBNORG0000008460', 'NBNORG0000008458',
        'NBNORG0000008462', 'NBNORG0000008461', 'NBNORG0000007740', 'NBNORG0000007882',
        'NBNORG0000007484', 'NBNORG0000008418', 'NBNORG0000008008', 'NBNORG0000054578',
        'NBNORG0000008070', 'NBNORG0000007824', 'NBNORG0000007996', 'NBNORG0000007168',
        'NBNORG0000007122', 'NBNORG0000008011', 'NBNORG0000007758', 'NBNORG0000007754',
        'NBNORG0000007225', 'NBNORG0000116338', 'NBNORG0000008676', 'NBNORG0000008292',
        'NBNORG0000055006', 'NBNORG0000007844', 'NBNORG0000007255', 'NBNORG0000008577',
        'NBNORG0000055121', 'NBNORG0000007725', 'NBNORG0000008436', 'NBNORG0000008480',
        'NBNORG0000007807', 'NBNORG0000007808', 'NBNORG0000008379', 'NBNORG0000055222',
        'NBNORG0000008207', 'NBNORG0000008209', 'NBNORG0000007970', 'NBNORG0000007364',
        'NBNORG0000008376', 'NBNORG0000008231', 'NBNORG0000008079', 'NBNORG0000007666',
        'NBNORG0000055395', 'NBNORG0000055473', 'NBNORG0000007473', 'NBNORG0000007111',
        'NBNORG0000007250', 'NBNORG0000008380', 'NBNORG0000007072', 'NBNORG0000007741',
        'NBNORG0000008107', 'NBNORG0000055702', 'NBNORG0000008100', 'NBNORG0000007080',
        'NBNORG0000055584', 'NBNORG0000008368', 'NBNORG0000008434', 'NBNORG0000008432',
        'NBNORG0000008433', 'NBNORG0000008431', 'NBNORG0000116339', 'NBNORG0000007142',
        'NBNORG0000055851', 'NBNORG0000007871', 'NBNORG0000007877', 'NBNORG0000007868',
        'NBNORG0000007878', 'NBNORG0000007869', 'NBNORG0000008763', 'NBNORG0000008203',
        'NBNORG0000055871', 'NBNORG0000008395', 'NBNORG0000007909', 'NBNORG0000007075',
        'NBNORG0000007438', 'NBNORG0000007440', 'NBNORG0000007439', 'NBNORG0000007449',
        'NBNORG0000107166', 'NBNORG0000007583', 'NBNORG0000007582', 'NBNORG0000008333',
        'NBNORG0000008176', 'NBNORG0000056025', 'NBNORG0000007865', 'NBNORG0000008109',
        'NBNORG0000008108', 'NBNORG0000008118', 'NBNORG0000116340', 'NBNORG0000007118',
        'NBNORG0000008166', 'NBNORG0000008763', 'NBNORG0000007900', 'NBNORG0000008012',
        'NBNORG0000007951', 'NBNORG0000007958', 'NBNORG0000007950', 'NBNORG0000056411',
        'NBNORG0000007931', 'NBNORG0000008503', 'NBNORG0000008454', 'NBNORG0000007628',
        'NBNORG0000007631', 'NBNORG0000056485', 'NBNORG0000008268', 'NBNORG0000007923',
        'NBNORG0000056615', 'NBNORG0000007657', 'NBNORG0000007656', 'NBNORG0000056621',
        'NBNORG0000008493', 'NBNORG0000056650', 'NBNORG0000008426', 'NBNORG0000116341',
        'NBNORG0000007784', 'NBNORG0000007798', 'NBNORG0000007425', 'NBNORG0000007261',
        'NBNORG0000007770', 'NBNORG0000116342', 'NBNORG0000008131', 'NBNORG0000008194',
        'NBNORG0000008196', 'NBNORG0000008193', 'NBNORG0000008488', 'NBNORG0000008486',
        'NBNORG0000008323', 'NBNORG0000008313', 'NBNORG0000007828', 'NBNORG0000007826',
        'NBNORG0000007593', 'NBNORG0000007590', 'NBNORG0000091471'
    ]) AS organism_key
)
SELECT
    cttl_preferred.id AS "id",
    cttl_preferred.taxon_group_id AS "taxon_group",
    cttl_preferred.taxon AS "taxon",
    CASE WHEN cttl_preferred.default_common_name != cttl_preferred.taxon THEN cttl_preferred.default_common_name ELSE '' END AS "common_name",
    CASE WHEN (lower('wildflower') = 'indicator' OR lower('wildflower') = 'inventory') THEN string_agg(distinct cttl_synonym.taxon, ',') ELSE '' END AS "synonym",
    cttl_preferred.organism_key AS "organism_key",
    cts.identification_difficulty AS "difficulty",
    count(distinct cof.id) AS "frequency"
FROM cttl_preferred
LEFT JOIN cttl_synonym ON cttl_synonym.taxon_meaning_id = cttl_preferred.taxon_meaning_id
JOIN cache_taxon_searchterms cts ON cts.taxa_taxon_list_id = cttl_preferred.id AND cts.simplified = FALSE
LEFT JOIN cof_filtered cof ON (cof.taxa_taxon_list_id = cttl_preferred.id OR cof.taxa_taxon_list_id = cttl_synonym.id)
WHERE
    (
        (lower('wildflower') = 'wildflower' OR lower('wildflower') = 'indicator')
        AND cttl_preferred.organism_key IN (SELECT organism_key FROM wildflower_organism_keys)
    )
    OR
    (
        (lower('wildflower') = 'inventory' OR lower('wildflower') = 'additional')
        AND cttl_preferred.taxon_group_id IN (78, 81, 87, 99, 89, 94, 137, 148)
    )
GROUP BY
    cttl_preferred.id,
    cttl_preferred.taxon_group_id,
    cttl_preferred.taxon,
    common_name,
    cttl_preferred.organism_key,
    cts.identification_difficulty
ORDER BY frequency desc;
andrewvanbreda commented 1 month ago

Hi @sacrevert OK, thanks for the info. I will create a new version of the report with this re-arrangement. I do not know why it would be faster, I guess there is some technical reason for it. If anything I would have thought it might be slower as the joins don't have the conditions on them. It is slightly less readable in my opinion, but I guess that doesn't matter much.

sacrevert commented 1 month ago

@andrewvanbreda well, feel free to test it locally. I'm not wedded to this formulation, just an experience I had elsewhere that I thought might benefit us here potentially