BiologicalRecordsCentre / ABLE

Assessing ButterfLies in Europe project repository
2 stars 3 forks source link

Swedish moth names #542

Open larspett opened 1 year ago

larspett commented 1 year ago

Last spring we provided a list of Swedish day active moths (https://github.com/BiologicalRecordsCentre/ABLE/issues/417) that I believe was deployed but none of those seem available.

Also, I wonder if we did provide a full list of the Swedish macro moths? If not, we can easily do that, currently we only see moth names in German in the moth part of the app. In the butterfly parts of the app, only butterflies are available despite the day active moth option being enabled.

If you want us to provide a moth dictionary, what format and which taxonomic groups should I provide?

larspett commented 1 year ago

@chrisvanswaay maybe you have the format and taxonomic group info that I would need to supply Swedish common names for the full set of macromoths?

chrisvanswaay commented 1 year ago

No I haven't, but I think @JimBacon might help you with a start. He also arranged all the Dutch butterfly names (actually better than I did).

larspett commented 1 year ago

@DavidRoy @kazlauskis Here are the Swedish common micro and macro moth names in a tab separated textfile with scientific names first on each row. There are a few new aggregates (5) plus some mismatches between our taxonomic backbone and the ones used in the taxonomic aggregate dataset I got for NL (5). All those are listed in a separate text file. Is this the way you want the data @JimBacon ? Would be great if it could be added to the homepage (where the common names are a mixture of German and English for some reason and the app (where common names seem to be German only) Scientific_and_Swedish_common_names.txt Scientific_and_Swedish_common_names_additions_and_mismatches.txt

JimBacon commented 1 year ago

Hi, I'm sure the file format will be fine but I do not expect to look at this for several weeks as I have a list of other jobs I am working on. Jim.

larspett commented 1 year ago

The Swedish taxonomic backbone seems like it hadn't been updated against gbif etc so only one of the taxonomic mismatches were actully a mismatch. Have updated the aggregates and mismatch list here Scientific_and_Swedish_common_names_additions_and_mismatches_v2.txt

DavidRoy commented 1 year ago

@kazlauskis can you review how the day-flying moths are being included in the app? @andrewvanbreda could you add in the swedish common names when working on #464

andrewvanbreda commented 1 year ago

@DavidRoy Will do. Am just checking some things with John about the other issue first.

larspett commented 1 year ago

@kazlauskis is this one closing in? Am planning for some moth monitoring publicity but need to have Swedish common names in the app and website before I can do that

kazlauskis commented 1 year ago

Not yet, but we are actively working on the app, and we'll look into this next week.

larspett commented 1 year ago

Much appreciated

andrewvanbreda commented 1 year ago

Hi @BirenRathod Would you be able to get me these results please whenever you get chance. Thanks

SQL 1.txt SQL 2.csv

BirenRathod commented 1 year ago

@andrewvanbreda I attached the results. ABLE.zip

larspett commented 1 year ago

Any feedback needed from my end here @andrewvanbreda? Will the names end up in the butterfly count at the same time they are deployed at the site? I'd love to promote the app more widely but all names are in German at the moment

andrewvanbreda commented 1 year ago

Hi @larspett

I have prepared some of the import files, but have not done the actual import yet.

The reason for the delay is the original file actually contains synonyms which are not held in the species list, so I don't think the importer will recognise them.

I am in the process of changing the import files, but I just wanted to double check with my brother (John van Breda) about this approach before I did the actual import.

So for example, one example would be "Yponomeuta irrorellus" in the import file is "Yponomeuta irrorella" in the EBMS Moths database list but many are like this.

I have not developed on or used the butterfly count app myself, however if it uses the same species list I guess the names would appear in the app. @kazlauskis might be able to confirm that one.

Andy

larspett commented 1 year ago

Ok I saw those and thought I had removed them but that was probably just in the aggregates list (where this particular example is found too). Sorry about the non-standard taxonomy used by the Swedish taxonomic backbone

andrewvanbreda commented 1 year ago

@larspett No problem. Will let you know when done.

kazlauskis commented 1 year ago

@andrewvanbreda is the species list update complete now? I will then fetch the new taxa to the app.

andrewvanbreda commented 1 year ago

@kazlauskis No it isn't, as have had to do a lot of preparation. However I should be in a position to import most of it in the next day or so. There will be about 350 items that won't be imported initially for various reasons.

andrewvanbreda commented 1 year ago

Hi @BirenRathod Would you be able to get me this result whenever you get a sec please. I hope it will return nothing. Cheers.

Test if not exists SQL.txt

BirenRathod commented 1 year ago

@andrewvanbreda have you got my other email about using PgAdmin from your machine, so you can run this query?

andrewvanbreda commented 1 year ago

@BirenRathod Thanks Biren, I hadn't seen the email but I have now. I will look into fixing NPMS first, then checkout Pgadmin after that. Cheers

andrewvanbreda commented 1 year ago

@kazlauskis @larspett I have just done the import which seems to have gone as planned. It is just under 3000 items imported, so some are deliberately left out until I can deal with those further.

For reference imported files are. File 1.csv File 2.csv File 3.csv File 4.csv

larspett commented 1 year ago

@andrewvanbreda is the list in production or still in DEV? German names seem to persist in the website observation lists

kazlauskis commented 1 year ago

@Vilius-Stankaitis can you update the species list in the app?

JimBacon commented 1 year ago

@Vilius-Stankaitis wait a moment while we sort something.

@andrewvanbreda your import has deleted the common names that previously existed. Are you able to restore them?

E.g. This query

SELECT ttl.id, taxon, language, ttl.updated_on, ttl.deleted
FROM taxa_taxon_lists ttl
JOIN taxa t
    ON t.id = ttl.taxon_id
JOIN languages l
    ON l.id = t.language_id
WHERE taxon_meaning_id = 221417

returns

taxa_taxon_list_id taxon language updated_on deleted
461092 Noctua pronuba Latin 2020-02-19 10:42:10.244947 false
612966 större bandfly Swedish 2023-04-18 20:17:33 false
541306 huismoeder Dutch 2023-04-18 20:17:33 true
461093 Hausmutter German 2023-04-18 20:17:33 true
andrewvanbreda commented 1 year ago

@JimBacon I only have read access to the database, but I can query it then get biren to update.

However more concerning is this has happened, as the importer shouldn't have behaved like that. I will raise that in Github once have confirmed that.

johnvanbreda commented 1 year ago

@andrewvanbreda I'm not so sure. CommonNames is an import field which groups the names together, so if you supply a commonNames value, it replaces the whole set. That's how the UI works and the importer has always been just a form of front end to the warehouse data entry UI.

JimBacon commented 1 year ago

It was reading your comment here recently which made me alive to the problem, John. Worth raising an issue to warn users of this behaviour if importing common names (on any other similar field), I think.

andrewvanbreda commented 1 year ago

@JimBacon @johnvanbreda I will raise it in a bit as have a few thoughts on matter, and was involved in the project for which the existing data update functionality was added. I have a few tests I want to do first.

andrewvanbreda commented 1 year ago

Hi @BirenRathod Are you able to run the statement below i have marked as "TEST", and if that affects 3498 rows, can you then run the other two statements. If that first statement doesn't return 3498 let me know (although it should do as have run a select statement to test it, just double-checking)

TEST update indicia.taxa_taxon_lists set deleted=deleted where updated_by_id = 5553 and deleted = true and taxon_list_id = 260 and updated_on > '2023-04-18 19:00'::timestamp;

ACTUAL STATEMENTS update indicia.taxa_taxon_lists set deleted=false,updated_by_id = 5553, updated_on = now() where updated_by_id = 5553 and deleted = true and taxon_list_id = 260 and updated_on > '2023-04-18 19:00'::timestamp;

insert into indicia.work_queue(task, entity, record_id, cost_estimate, priority, created_on) select 'task_cache_builder_update', 'taxa_taxon_list', id, 100, 2, now() from indicia.taxa_taxon_lists where id in ( select id from indicia.taxa_taxon_lists where updated_by_id = 5553 and deleted = false and taxon_list_id = 260 and updated_on > '2023-04-18 19:00'::timestamp );

BirenRathod commented 1 year ago

@andrewvanbreda It has updated exactly 3498 rows.

andrewvanbreda commented 1 year ago

@BirenRathod Brilliant thanks :)

@JimBacon Am seeing those names restored in the Warehouse UI now. Can also see them on the website with the correct presence ID. Your query above also appears to be correct to me. I don't know if you want to confirm you are happy with things before they do an app import.

Have raised an issue about the importer here btw, https://github.com/Indicia-Team/warehouse/issues/477

Perhaps most of that issue could be argued as matter of opinion, although perhaps point 2 is quite important.

andrewvanbreda commented 1 year ago

Hi @larspett I have noticed something regarding one name on the import. mjölkörtsspinnare is provided as the translation on both the Brahmaeidae family nam eand species Lemonia dumi. So if this is typed into the website as the occurrence, two names appear with no visual difference between them, even though the intended meaning is different

Screen Shot 2023-04-19 at 16 47 51

larspett commented 1 year ago

Good point and something the Swedish taxonomists probably have missed. I have raised the issue with them and expect to hear back soon(?). What about an ugly hack calling the family mjölkörtsspinnare_ and fixing that once I hear back from Uppsala? So it doesn't stall the upload

andrewvanbreda commented 1 year ago

Hi @larspett I have done that, and can see it working on the website.

There are the 300 I didn't manage to upload that I will need to examine, so once you hear back from Uppsala, we can probably upload the change along with these others.

larspett commented 1 year ago

Did you post those 300? There was a bunch of csv:s and I wasn't sure if I should check. Looked like Uppsala used some obsolete synonyms like ella vs. ellus etc

andrewvanbreda commented 1 year ago

@larspett Yes I will post them here once I have chance to look again at the data, and write you explanations of what issues I have seen. I will try to do this as soon as I can. But yes there will be some things for you to check. What you say there is indeed one of the problems.

larspett commented 1 year ago

By the way, we have discovered som e species complexes that ObsIdentify solves but that end up as Unknown in ButterflyCount. Can that be because of those complexes not being listed in the uploaded taxonomy? BTW still German names on Prod webpage

andrewvanbreda commented 1 year ago

Hi @larspett

I am not sure. Although I have worked on many projects, I am fairly new to this particular project, so am not aware of some of the reasons why things are setup a particular way.

I can tell you the "EBMS Moths" species list does have an "Unknown" taxon item, and people can select that item during data entry as they wish. Any species complexes would need to be present in the species list as taxon items in the same way as standard non-complex items.

If the above information does not answer your question, perhaps I could ask David to comment.

Not sure about the German, could you send me the URL address of the page where there is German that you don't think should be there.

larspett commented 1 year ago

It is the image analysis assigns unknown in our app but - correctly - an aggregate in ObsIdentify

larspett commented 1 year ago

https://butterfly-monitoring.net/elastic/all-records Swedish language set

andrewvanbreda commented 1 year ago

@larspett Ok yes I can see the German problem. This is using the Elastic Search index. I will raise this in Github so David can assign to the correct person.

The other issue is not an area I am familiar with. I will note this, and email David about it. I think we are having a meeting about outstanding issues next week.

larspett commented 1 year ago

This one becomes Unknown in ButterflyCount but a species aggregate in ObsIdentify. The confidence isn't sky high here but I've been up to about 90% with the same outcome image image

DavidRoy commented 1 year ago

@larspett can you raise this as a separate issue. If issues expand we never close them ;-)

larspett commented 1 year ago

@DavidRoy The image ID issue is now issue #549

andrewvanbreda commented 1 year ago

@larspett Likewise I have raised the language problem for you here https://github.com/BiologicalRecordsCentre/ABLE/issues/550

andrewvanbreda commented 1 year ago

Hi @larspett ,

So here is a summary of what wasn't imported.

  1. Rows where "NA" was provided in the Swedish column were left out.

  2. Rows where the Swedish was already present in the database that didn't need importing.

  3. Rows that are in the files attached (these are rows where an exact latin match wasn't found in the species list). There are two files. "Items with simple suggestions" with a suggested alternative name that is present in the species list and may be useable.

"Notes in detail.csv" is similar, apart from the problems aren't as simple, so the notes column contains more detail. In some cases I have put an alternative synonym. I am not a moth expert, so these suggestions have come from an examination of the UK Master List for an alternative, and my suggestions should be checked carefully.

If you want this data to be imported, can you confirm the alternatives in both files (maybe put another column in the sheets with your notes?)

In both files, if the extra column contains nothing, it means I couldn't find an alternative.

If you think there is any new species that you need that aren't in the species list at the moment, I will need to talk to David.

Items with simple suggestions.csv

Notes in detail.csv

larspett commented 1 year ago

Looks like Swedish systematics isn't always that systematic. Will look into this. So the empty cells are species that don't exist in the present version of the EBMS list?

andrewvanbreda commented 1 year ago

Hi @larspett Yes the empty cell means the exact species name isn't present and I couldn't find a similar alternative so I can't suggest something. (although it is possible there might be a suitable alternative synonym but I simply was unable to find it myself). So I think if there are any species you definetely need that aren't present, please note them in the sheet, so they can looked at further.

If I have put notes, it means an exact match for the name wasn't found, but I was able to find a possible alternative I need you to check.

Vilius-Stankaitis commented 1 year ago

Hi, is this sorted? Can I update the app?