BiologicalRecordsCentre / ABLE

Assessing ButterfLies in Europe project repository
2 stars 3 forks source link

Entering species complexes (via the website) #464

Closed chrisvanswaay closed 10 months ago

chrisvanswaay commented 1 year ago

Martin Musche (UFZ) wrote me: A problem appeared when I tried to enter species complexes via the website that are not in the species list. Although I was able to write the name of the complex into the species field this procedure prevented the selection of species from the list in subsequent fields. An example workflow:

(1) Noctua janthe/ janhina - not in the species list, therefore written in the species field (2) Eilema complana - selected from the species list without problems (3) Peribatodes rhomboidaria - not possible to select from the species list (although the species is in the list)

It seems that (1) causes an internal error that prevents a proper species selction in (3). Do you have an alternative approach for entering species complexes?

chrisvanswaay commented 1 year ago

@JimBacon @DavidRoy

DavidRoy commented 1 year ago

@chrisvanswaay for (1) you can only enter names (via the app or website) that are already known by the system. We'll need to extend the species dictionary for these complexes. (3) can Martin provide a screenshot as I'm not sure why this name is not able to be selected. Are you able to replicate this problem?

chrisvanswaay commented 1 year ago

Well, I can't reproduce it. But I also can't enter a complex. How shall we tackle this? As I start I could list our species-complexes, but of course these don't work in Hungary or Spain. Maybe better ask the coordinators there?

DavidRoy commented 1 year ago

yes, we'll need the regional co-ordinators to provide a list of complexes needed. We've had the same requirement with the butterfly lists. Ideally we would do an update to the lists in one go as it will require an update to both the website and the app.

chrisvanswaay commented 1 year ago

@DavidRoy Would it be an idea to ask our SPRING partner observation.org which moth-species-complexes they use? Would be a good start, I guess, if they provide a list to start with. I have also asked the fieldwork-partners for species complexes.

johnvanbreda commented 1 year ago

@chrisvanswaay I'd like to know more about this comment:

It seems that (1) causes an internal error that prevents a proper species selction in (3)

Can you let me know how to reproduce this internal error please?

chrisvanswaay commented 1 year ago

@johnvanbreda I couldn't reproduce it myself. Martin Musche wrote: (1) Noctua janthe/ janhina - not in the species list, therefore written in the species field I simply couldn't enter a species not in the species list. He says 'written in the species field', but what does he mean? Of course you could ask him directly, he is a member of our SPRING team (martin.musche@ufz.de).

DavidRoy commented 1 year ago

@chrisvanswaay I'll contact Martin and suggest he comments directly to any issues he has

I also support your idea to ask fieldwork partners to identify any species complexes required

MartinMusche commented 1 year ago

I attached a screenshot. This was the workflow:

(1) Selection of Peribatodes rhomboidaria from the list (2) Selection of Noctua janthe from the list (3) Change of species name using the edit button from Noctua janthe to Noctua janthe/janthina (4) Selection of the next species (Eilema complana) from the list (5) try to select the next species from the list which is not possible screenshot_complexes.pdf

chrisvanswaay commented 1 year ago

@DavidRoy @MartinMusche asked me about the developments. Would be easiest to add the species complexes from observation.org, I guess.

chrisvanswaay commented 1 year ago

@DavidRoy asked me if I could provide the observation.org complexes to incorporate. Attached the list with complexes in the moths. Furthermore I attach a list with the pseudospecies as we use them in our Dutch moth monitoring, and the list that Andras provided for Hungary. Would be great if these could be added, as these are impossible to id in the field (the pseudospecies for NL and H) or that these are the names the AI of observation.org can send back (as these are used in the AI).

pseudospecies_NL.xlsx pseudospecies_H.xlsx pseudospecies_observation.org.csv

larspett commented 1 year ago

I would like to add Timandra comae/griseata (Geometridae) to the list

larspett commented 1 year ago

@chrisvanswaay can you pull out the full set of Swedish moth names from Observado too? It looks like they have already imported a correct list (if you click onward to the website from the ObsIdentify app) so pulling the names from Observado might be the quickest way to get the same list as what you have in Dutch for the Netherlands, Chris. Then we could import that one quickly just as has been done for the Dutch names

chrisvanswaay commented 1 year ago

@larspett I don't have access to the taxonomical data, I would have to ask. @DavidRoy Would it not be more efficient we ask Mark van Nieuwstadt of Naturalis to provide all the vernacular names they have? I mean also for most other European languages.

DavidRoy commented 1 year ago

I'll ask Naturalis colleagues working on the NIA

DavidRoy commented 1 year ago

@andrewvanbreda can you add the additional names to the moth list, based on the files linked to: https://github.com/BiologicalRecordsCentre/ABLE/issues/464#issuecomment-1477544950

Plus this addition: Timandra comae/griseata

larspett commented 1 year ago

There are another 4 additions that are needed for our region, they are mentioned in #542 but also supplied here: Euxoa obelisca/tritici/vitta
Euxoa obelisca/vitta Diarsia florida/rubi
Archiearis parthenias/Boudinotiana notha

Would be excellent if the Swedish names in #542 could be deployed too

andrewvanbreda commented 1 year ago

Hi @BirenRathod

Whenever you get a chance could you send me the results of these two SQL statements please.

Cheers.

SQL (genus check).txt SQL (species check).txt

BirenRathod commented 1 year ago

@andrewvanbreda Here are the results.

ABLE.zip

andrewvanbreda commented 1 year ago

@BirenRathod Thanks Biren, that is really useful.

andrewvanbreda commented 1 year ago

@BirenRathod Hi again, Would you also be able to send me these two results, please thanks. Hopefully that will give me all the info I need.

SQL (species check) 2.txt SQL (species check) 3.txt

BirenRathod commented 1 year ago

@andrewvanbreda First query result is attached here. the second query returned no result.

data-1680168721465_check2.zip

andrewvanbreda commented 1 year ago

@BirenRathod That is great, thanks Biren. It isn't a problem there are no results as looking for existing entries before import is made.

andrewvanbreda commented 1 year ago

Hi @DavidRoy @johnvanbreda I have prepared some import files, but am noticing an issue on live that I wasn't getting during test checks on my own machine. The live EBMS Moths species list has a mandatory "Day-active" attribute, but I don't have that information on the import rows Actually having just looked at the attribute, it isn't set as "Required" at either the global or survey level, but the taxon importer seems to be insisting there is a column for it. Am I missing something?

andrewvanbreda commented 1 year ago

Screen Shot 2023-03-31 at 15 48 06

DavidRoy commented 1 year ago

could you add a 'day-active' column to the file and set it to False for all the new taxa. We can set those that need it to True once imported, via the warehouse interface. It won't be many taxa

andrewvanbreda commented 1 year ago

Hi @DavidRoy Thanks for the feedback. Yes I can do it that way.

andrewvanbreda commented 1 year ago

@DavidRoy I have imported the spreadsheets. Everything looks to be as I intended, and I can see the species on the Moth form too.

There was 1 row that didn't import at all because of a mistake I made when creating the import files, but I will enter that manually. Also I think some of the species requests made directly into this issue thread aren't done yet, so leave issue open until I can deal with these.

andrewvanbreda commented 1 year ago

The following have now been added:

Timandra comae/griseata Euxoa obelisca/tritici/vitta Euxoa obelisca/vitta Diarsia florida/rubi Archiearis parthenias/Boudinotiana notha Depressaria spec. (broken import row mentioned from yesterday)

I think before this issue can be closed, the "Day active" attribute will need setting on the records that need it.

DavidRoy commented 1 year ago

@chrisvanswaay @MartinMusche @larspett

larspett commented 1 year ago

This covers our needs, Many thanks.

Archiearis parthenias/Boudinotiana notha are day active

chrisvanswaay commented 1 year ago

@DavidRoy Is there an overview of the species complexes? So I can send it around to the SPRING moth coordinators? And for the dayactive moths: I think this very much depends on what we want to do with them, and what would be the best way to monitoring them. The extremes are clear and simple (Zygaena's are almost all only dayactive, most moths never), but some species are a mix. I guess each country draws its own borders so far. European trends can only be possible if we all use at least a minimum-list. Or do we have such a list already?

DavidRoy commented 1 year ago

@chrisvanswaay i agree that day-flying is a vague concept. I'll send you the latest list by email

andrewvanbreda commented 1 year ago

If needed for future reference, the import files that were used are as follows:

Extra genus to import.csv pseudospecies_H.csv pseudospecies_NL (small initial test).csv pseudospeciesNL (the rest).csv pseudospecies_observation (no parent).csv pseudospecies_observation.org (with-parent).csv

JimBacon commented 1 year ago

Hi

I think this import has created a problem of duplicates, for example

Acleris laterana / comariana             | Acleris laterana/comariana
Acronicta psi/tridens                    | Acronicta tridens / psi
Agonopterix ciliella/heracliana          | Agonopterix heracliana / ciliella
Amphipyra berbera / pyramidea            | Amphipyra berbera/pyramidea
...
Xestia ditrapezium/triangulum            | Xestia triangulum/ditrapezium
Zimmermannia atrifrontella/longicaudella | Zimmermannia longicaudella / atrifrontella

I wouldn't mind having the names both ways round, so they are easier to search, provided they were synonyms but these are all preferred with different taxon_meaning_ids.

larspett commented 1 year ago

Two ways round sounds good, but there seems to be inconsistencies whether there should be space-slash-space between species names or not. Would be good to trim (or pad) that to something consistent. I would be in favour of trimming to atrifrontella/longicaudella etc rather than padding everything into the atrifrontella / longicaudella format

andrewvanbreda commented 1 year ago

Hi @JimBacon Yes the import would of created duplicates if there were things like spelling mistakes, or different orderings etc. I am bit confused by spacing duplicates, as SQL was run to detect those and nothing returned, actually the same SQL is detecting the duplicate on my machine if I use an example from above (Amphipyra berbera / pyramidea). If there duplicates which are presented in different order that can be identified, it should be possible to change these to be synonyms.

andrewvanbreda commented 1 year ago

Hi @JimBacon Oh I see the problem. The duplicate is not because the import process failed, but the files themselves have duplicates in. Yes this would happen as the import was done on the assumption that this is the data that we wanted to be imported. Well spotted. I will try to work on a solution when I can, but am overrun at sec, so if you think you can fix this this quicker your end let me know (I also don't have direct access to the database so would need someone else to run the sql for me)

DavidRoy commented 1 year ago

@andrewvanbreda please resolve when you can as @JimBacon is working on other priorities at the moment. Thanks

andrewvanbreda commented 1 year ago

Hi @DavidRoy ,

I have removed the simple duplicate species that were appearing more than once in the spreadsheet (or were duplicates with extra spaces).

The Warehouse UI reported that these have no occurrences recorded against them, so were safe to remove. (As the Dutch had only been imported onto one of the duplicate items, it was usually just a case of removing the one which had no Dutch attached to it)

I haven't dealt with most of the more complex cases yet where species names appear in different order, as that is more complex issue. Actually sometimes it isn't just order, for instance "Horisme tersata / radicaria" vs "Horisme tersata/Horisme radicaria" which is an obvious duplicate by eye, but less obvious for a computer.

Removed duplicates.txt

andrewvanbreda commented 1 year ago

Further removals Reordering Duplicates.txt

andrewvanbreda commented 1 year ago

Hi @BirenRathod,

Would you be able to run this whenever you have a chance please (Not urgent before Easter break). Cheers.

select (select string_agg(distinct species_name_element,',' ORDER BY species_name_element) as species_reordered from unnest(string_to_array( replace(replace(taxon,' / ', '/'),' ', '/') ,'/') ) as species_name_element) from indicia.cache_taxa_taxon_lists where taxon_list_id = 260 and preferred = true GROUP BY species_reordered HAVING count(*) > 1;

BirenRathod commented 1 year ago

@andrewvanbreda Here it is. data-1680796355502_SNE.zip

andrewvanbreda commented 1 year ago

Thanks @BirenRathod, that is excellent, really useful

andrewvanbreda commented 1 year ago

@DavidRoy I have removed the remainder of what I think are duplicates, see file "Duplicates (name order mismatched)" for details.

The SQL also picked up some possible problems with the species list that I have not taken any action regarding as they didn't relate to the import, but details can be found in the file "Possible problems (no action taken)".

I have not taken any action regarding the padding around the slashes, this could be fixed though with some sql if you wanted.

Duplicates (name order mismatched).txt

Possible problems (no action taken).txt