ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

[Bug/Feature request] Need more noclass_term fields in cf_temp_classification #7971

Closed camwebb closed 1 month ago

camwebb commented 1 month ago

I'm trying to load the new UAM Plants classification from WFO. It has 57 noclass_term and the bulkloader is failing with the error: 'column "noclass_term_type_21" of relation "cf_temp_classification" does not exist', so I assume the cf_temp_classification table only has 20 noclass_term. I request 60 noclass_term please.

Now why, you are asking, do we need so many? Here's the reason: Spiraea crenata. In WFO there are 11 different variations of the name. Here's (part of) the single line of bulkloader for the name (split onto lines for readability):

(links)
  Spiraea crenata
  UAM Plants
(class_term)
  kingdom : Plantae
  phylum : Tracheophyta
  family : Rosaceae
  genus : Spiraea
  species : Spiraea crenata
  subspecies : 
  variety : 
  forma : 
(noclass_term)
  classification source : World Flora Online v.2023.12
  managed_by : camwebb@Arctos
  1_fullname : "Spiraea crenata Pall."
  1_wfoID : wfo-0000983998
  1_status : Synonym
  1_synonym_of_wfoID : wfo-0000985818
  1_synonym_of_name : "Spiraea hypericifolia L."
  2_fullname : "Spiraea crenata L."
  2_wfoID : wfo-0000985813
  2_status : Accepted
  2_synonym_of_wfoID :
  2_synonym_of_name : ""
  ... (8 more variations snipped) ...
  11_fullname : "Spiraea crenata auct."
  11_wfoID : wfo-0001011545
  11_status : Synonym
  11_synonym_of_wfoID : wfo-0000985818
  11_synonym_of_name : "Spiraea hypericifolia L."

There's no great hurry, but the file is ready to load. It has 9766 names. See repo for the build script.

Thanks

camwebb commented 1 month ago

A potentially better alternative solution would be to rewrite the import script/table to accept a bulkloader file with single term per line:

name,source,hierarchical,term_type,term
Spiraea crenata,UAM Plants,yes,genus,Spiraea
Spiraea crenata,UAM Plants,no,11_synonym_of_wfoID,wfo-0000985818

... just thinking aloud

dustymc commented 1 month ago

Changing from bug to feature request and going active on this, it's completely additive, don't think there's any reason for discussion.

why, you are asking

While wearing Arctos Developer hat: I'm not asking, classifications are only shared if someone wants to buy into them, you've got your own, do whatever you want, it only affects Arctos by adding a bit of entirely-optional functionality. (There's an impact on performance and such, but I think all of these requests are 'drop in the bucket' in the big picture so I'm ignoring that aspect.)

OK, hats off: Sure, there's a giant mess in taxonomy-land, and a taxonomy database-thingee has done what they can to record that, good for them, but WHY would anyone cataloging plants want to deal with that? Probably your collection only thinks of one thing, not 11, when someone mentions Spiraea crenata, so why bring the noise onboard? It would be fabulous if you could answer in the form of a how-to, whatever you're up to is probably of interest to others. (Or feel free not to answer at all!)

Also - and do please feel entirely free to ignore this - "partial identifiers" are a product of the devil himself, from your example (https://arctos.database.museum/name/Erigeron%20acris#UAMPlants) I would transform


1_synonym_of_wfo : <a href="https://worldfloraonline.org/taxon/wfo-0000009039">wfo-0000009039</a>

into

1_synonym_of_wfo : <a href="https://worldfloraonline.org/taxon/wfo-0000009039">https://worldfloraonline.org/taxon/wfo-0000009039</a>

so that nobody ever has to guess what a 'wfo-number' might be, then I'd add a bare term to better support search

1_synonym_id_or_something: https://worldfloraonline.org/taxon/wfo-0000009039

Or possibly you could just provide the bare URI/GUID and request UI magic to linkify it (I would).

alternative solution

I'm not COMPLETELY attached to anything (and definitely like data-driven built-in expansion), but

  1. Most people aren't going to be able to successfully use that, while anything will spit out CSV
  2. It would have to do something weird for NULL classification term types, and
  3. It would have to use some delimiter which can't appear in the data

If we really need to go beyond CSV, we could just write to the table, which would mean a need for more organization and unique identifiers and such, but it's simple structure and normalized data is always nice...

arctosprod@arctos>> \d taxon_term;
                                                 Table "core.taxon_term"
           Column           |            Type             | Collation | Nullable |                Default                
----------------------------+-----------------------------+-----------+----------+---------------------------------------
 taxon_term_id              | bigint                      |           | not null | nextval('sq_taxon_term_id'::regclass)
 taxon_name_id              | bigint                      |           | not null | 
 classification_id          | character varying(4000)     |           |          | 
 term                       | character varying(4000)     |           | not null | 
 term_type                  | character varying(255)      |           |          | 
 source                     | character varying(255)      |           | not null | 
 gn_score                   | real                        |           |          | 
 position_in_classification | bigint                      |           |          | 
 lastdate                   | timestamp without time zone |           | not null | LOCALTIMESTAMP
 match_type                 | character varying(255)      |           |          | 
Indexes:
camwebb commented 1 month ago

Probably your collection only thinks of one thing, not 11, when someone mentions Spiraea crenata, so why bring the noise onboard?

Two reasons: i) it would be a huge/impossible amount of work to look at all the labels for each identification, list the author strings actually used, and constrain the WFO 'everything' data to what we have; ii) We cannot know if a user will want to find one of out specimens based on a synonym we are not using or aware of: having the exhaustive WFO synonymy helps here.

It would be fabulous if you could answer in the form of a how-to

I'll bring this up at the next @ArctosDB/taxonomy meeting. I was planning to share this approach when I had the classification ready, and now I do. If people like this approach, I'll write a how to.

"partial identifiers"

:+1: Good suggestion

beyond CSV

Would you be open to or even prefer I make the classification import into a SQL INSERT which you could run? Would taxon_name, classification_id, term, term_type be sufficient, or would I need to calculate taxon_name_id and position_in_classification too?

dustymc commented 1 month ago

open to

Sure.

Maybe not 'prefer' because it ties you to someone with a scary password being available and removes some independence, but I don't mind pushing buttons either.

taxon_name

Yep, I can get taxon_name_id from that, no problem.

classification_id

You'd have to supply that, it's the thing that brings multiple rows back together, don't think there's any way I can calculate it. It just needs to be unique (a constraint which can't be actually constrained, so violations will result in some sort of super-weird UI, probably, not an error) - use a UUID or similar (something very probably unique, don't start with "1"!) and it'll (probably) be fine.

position_in_classification

You definitely would have to calculate that, it's what differentiates "classification terms" from "nonclass terms" (and organizes the former).

Maybe this is helpful:

select taxon_term.* from 
taxon_name
inner join taxon_term on taxon_name.taxon_name_id=taxon_term.taxon_name_id and taxon_term.source='UAM Plants'
where scientific_name='Erigeron acris';

 taxon_term_id | taxon_name_id |          classification_id          |                                                term                                                 |      term_type      |   source   | gn_score | position_in_classification |      lastdate       | match_type 
---------------+---------------+-------------------------------------+-----------------------------------------------------------------------------------------------------+---------------------+------------+----------+----------------------------+---------------------+------------
    1407644425 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | C.B.Clarke                                                                                          | 1_author_text       | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407644426 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Erigeron acris C.B.Clarke                                                                           | 1_full_name         | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407644427 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | <a href="https://arctos.database.museum/name/Erigeron pulchellus#UAMPlants">Erigeron pulchellus</a> | 1_synonym_of_arctos | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407644428 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Erigeron pulchellus Michx.                                                                          | 1_synonym_of_full   | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407644429 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | <a href="https://worldfloraonline.org/taxon/wfo-0000009039">wfo-0000009039</a>                      | 1_synonym_of_wfo    | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407644430 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | synonym                                                                                             | 1_taxon_status      | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407644431 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | <a href="https://worldfloraonline.org/taxon/wfo-0000067521">wfo-0000067521</a>                      | 1_wfo               | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407644432 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | L.                                                                                                  | 2_author_text       | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407644433 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Erigeron acris L.                                                                                   | 2_full_name         | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407644434 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | accepted                                                                                            | 2_taxon_status      | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407644435 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | <a href="https://worldfloraonline.org/taxon/wfo-0000085388">wfo-0000085388</a>                      | 2_wfo               | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407644436 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | camwebb@Arctos                                                                                      | managed_by          | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407644437 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | World Flora Online Plant List 2023-12                                                               | source_authority    | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407647358 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Erigeron pulchellus                                                                                 | synonym_of          | UAM Plants |          |                            | 2024-03-15 00:00:00 | 
    1407647359 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Plantae                                                                                             | kingdom             | UAM Plants |          |                          1 | 2024-03-15 00:00:00 | 
    1407647360 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Pteridobiotina                                                                                      | subkingdom          | UAM Plants |          |                          2 | 2024-03-15 00:00:00 | 
    1407647361 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Angiosperms                                                                                         | phylum              | UAM Plants |          |                          3 | 2024-03-15 00:00:00 | 
    1407647362 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Asterales                                                                                           | order               | UAM Plants |          |                          4 | 2024-03-15 00:00:00 | 
    1407647363 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Asteraceae                                                                                          | family              | UAM Plants |          |                          5 | 2024-03-15 00:00:00 | 
    1407647364 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Asteroideae                                                                                         | subfamily           | UAM Plants |          |                          6 | 2024-03-15 00:00:00 | 
    1407647365 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Astereae                                                                                            | tribe               | UAM Plants |          |                          7 | 2024-03-15 00:00:00 | 
    1407647366 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Conyzinae                                                                                           | subtribe            | UAM Plants |          |                          8 | 2024-03-15 00:00:00 | 
    1407647367 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Erigeron                                                                                            | genus               | UAM Plants |          |                          9 | 2024-03-15 00:00:00 | 
    1407647368 |         60293 | 4C14C9D7-4636-4934-B4BB89701FFE4C37 | Erigeron acris                                                                                      | species             | UAM Plants |          |                         10 | 2024-03-15 00:00:00 | 
(24 rows)

Time: 9.425 ms
camwebb commented 1 month ago

I've now created the classification upload as SQL, here. It imports without error into a dummy SQLite3 DB, so the SQL is good, but it may not quite fit into Arctos. Perhaps you could test a small portion of it and let me know. I broke up the VALUES into multiple INSERTS.

BTW, I don't understand the need for the classification_id... surely the combination of name and source are unique for a particular classification and could act in the same way?

dustymc commented 1 month ago

I'll re-open for SQL, but it might be a while before I find the attention span to sanitize and load that. (CSV would make things easier if that's possible?)

Or there are a lot more columns in the loader as of 7 minutes ago, if you want to go that way.

surely the combination of name and source are unique

Not even kinda. See https://arctos.database.museum/name/Erigeron%20acris#ArctosRelationships, https://arctos.database.museum/name/Erigeron%20acris#WoRMS, and maybe a few million more. (And those are AWESOME for discovery but kinda horrible for using in assertions/IDs - I was going to suggest you might consider such things in the other issue - eg assert "this name, that's all we know" - rather than trying to spell out everything you think might be involved, which probably isn't possible - and then either just rely on globalnames for discovery, or provide some "meta-source" that contains all 947 ways "Some weed" has been spelled. Maybe we should zoom - after next week - if that sounds worth further exploration.)

Raw data example, in case you're into that sort of thing:

select taxon_term.* from 
taxon_name
inner join taxon_term on taxon_name.taxon_name_id=taxon_term.taxon_name_id and taxon_term.source='WoRMS'
where scientific_name='Erigeron acris';

 taxon_term_id | taxon_name_id |                               classification_id                                |            term            |   term_type    | source | gn_score | position_in_classification |          lastdate          | match_type 
---------------+---------------+--------------------------------------------------------------------------------+----------------------------+----------------+--------+----------+----------------------------+----------------------------+------------
    1402121046 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Plantae                    | kingdom        | WoRMS  |        1 |                          1 | 2024-03-01 09:37:02.617931 | 
    1402121047 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Asterales                  | order          | WoRMS  |        1 |                          4 | 2024-03-01 09:37:02.617931 | 
    1402121048 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Compositae                 | family         | WoRMS  |        1 |                          5 | 2024-03-01 09:37:02.617931 | 
    1402121049 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Tracheophyta               | phylum         | WoRMS  |        1 |                          2 | 2024-03-01 09:37:02.617931 | 
    1402121050 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Magnoliopsida              | class          | WoRMS  |        1 |                          3 | 2024-03-01 09:37:02.617931 | 
    1402121051 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron                   | genus          | WoRMS  |        1 |                          6 | 2024-03-01 09:37:02.617931 | 
    1402121052 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Asterales                  | order          | WoRMS  |        1 |                          4 | 2024-03-01 09:37:02.617931 | 
    1402121053 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron pulchellus        | species        | WoRMS  |        1 |                          7 | 2024-03-01 09:37:02.617931 | 
    1402121054 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Compositae                 | family         | WoRMS  |        1 |                          5 | 2024-03-01 09:37:02.617931 | 
    1402121055 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron                   | genus          | WoRMS  |        1 |                          6 | 2024-03-01 09:37:02.617931 | 
    1402121056 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron pulchellus Michx. | name string    | WoRMS  |          |                            | 2024-03-01 09:37:02.617931 | 
    1402121057 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron acris             | species        | WoRMS  |        1 |                          7 | 2024-03-01 09:37:02.617931 | 
    1402121058 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron pulchellus        | canonical name | WoRMS  |          |                            | 2024-03-01 09:37:02.617931 | 
    1402121059 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron acris L.          | name string    | WoRMS  |          |                            | 2024-03-01 09:37:02.617931 | 
    1402121060 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron acris             | canonical name | WoRMS  |          |                            | 2024-03-01 09:37:02.617931 | 
    1402121061 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Plantae                    | kingdom        | WoRMS  |        1 |                          1 | 2024-03-01 09:37:02.617931 | 
    1402121062 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Tracheophyta               | phylum         | WoRMS  |        1 |                          2 | 2024-03-01 09:37:02.617931 | 
    1402121063 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Magnoliopsida              | class          | WoRMS  |        1 |                          3 | 2024-03-01 09:37:02.617931 | 
    1402472180 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Asterales                  | order          | WoRMS  |        1 |                          4 | 2024-03-01 16:35:24.142373 | 
    1402472181 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Plantae                    | kingdom        | WoRMS  |        1 |                          1 | 2024-03-01 16:35:24.142373 | 
    1402472182 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Compositae                 | family         | WoRMS  |        1 |                          5 | 2024-03-01 16:35:24.142373 | 
    1402472183 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Tracheophyta               | phylum         | WoRMS  |        1 |                          2 | 2024-03-01 16:35:24.142373 | 
    1402472184 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Magnoliopsida              | class          | WoRMS  |        1 |                          3 | 2024-03-01 16:35:24.142373 | 
    1402472185 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron                   | genus          | WoRMS  |        1 |                          6 | 2024-03-01 16:35:24.142373 | 
    1402472186 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Asterales                  | order          | WoRMS  |        1 |                          4 | 2024-03-01 16:35:24.142373 | 
    1402472187 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron pulchellus        | species        | WoRMS  |        1 |                          7 | 2024-03-01 16:35:24.142373 | 
    1402472188 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Compositae                 | family         | WoRMS  |        1 |                          5 | 2024-03-01 16:35:24.142373 | 
    1402472189 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron pulchellus Michx. | name string    | WoRMS  |          |                            | 2024-03-01 16:35:24.142373 | 
    1402472190 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron                   | genus          | WoRMS  |        1 |                          6 | 2024-03-01 16:35:24.142373 | 
    1402472191 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron pulchellus        | canonical name | WoRMS  |          |                            | 2024-03-01 16:35:24.142373 | 
    1402472192 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron acris             | species        | WoRMS  |        1 |                          7 | 2024-03-01 16:35:24.142373 | 
    1402472193 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron acris L.          | name string    | WoRMS  |          |                            | 2024-03-01 16:35:24.142373 | 
    1402472194 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Erigeron acris             | canonical name | WoRMS  |          |                            | 2024-03-01 16:35:24.142373 | 
    1402472195 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Plantae                    | kingdom        | WoRMS  |        1 |                          1 | 2024-03-01 16:35:24.142373 | 
    1402472196 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Tracheophyta               | phylum         | WoRMS  |        1 |                          2 | 2024-03-01 16:35:24.142373 | 
    1402472197 |         60293 | 202422|954898|846494|954900|846496|846504|18063|846535|35419|35420|35803|35811 | Magnoliopsida              | class          | WoRMS  |        1 |                          3 | 2024-03-01 16:35:24.142373 | 
    1404518332 |         60293 | 4F1F3E78-BE25-49A9-96D62E55C590F98A                                            | Asterales                  | order          | WoRMS  |        1 |                          4 | 2024-03-08 11:44:28.221969 | 
    1404518333 |         60293 | 4F1F3E78-BE25-49A9-96D62E55C590F98A                                            | Compositae                 | family         | WoRMS  |        1 |                          5 | 2024-03-08 11:44:28.221969 | 
    1404518334 |         60293 | 4F1F3E78-BE25-49A9-96D62E55C590F98A                                            | Erigeron                   | genus          | WoRMS  |        1 |                          6 | 2024-03-08 11:44:28.221969 | 
    1404518335 |         60293 | 4F1F3E78-BE25-49A9-96D62E55C590F98A                                            | Erigeron pulchellus        | species        | WoRMS  |        1 |                          7 | 2024-03-08 11:44:28.221969 | 
    1404518336 |         60293 | 4F1F3E78-BE25-49A9-96D62E55C590F98A                                            | Erigeron pulchellus Michx. | name string    | WoRMS  |          |                            | 2024-03-08 11:44:28.221969 | 
    1404518337 |         60293 | E1DF599B-4A38-4DAB-A0297014F44EAC41                                            | Plantae                    | kingdom        | WoRMS  |        1 |                          1 | 2024-03-08 11:44:28.221969 | 
    1404518338 |         60293 | E1DF599B-4A38-4DAB-A0297014F44EAC41                                            | Erigeron                   | genus          | WoRMS  |        1 |                          6 | 2024-03-08 11:44:28.221969 | 
    1404518339 |         60293 | 4F1F3E78-BE25-49A9-96D62E55C590F98A                                            | Erigeron pulchellus        | canonical name | WoRMS  |          |                            | 2024-03-08 11:44:28.221969 | 
    1404518340 |         60293 | E1DF599B-4A38-4DAB-A0297014F44EAC41                                            | Tracheophyta               | phylum         | WoRMS  |        1 |                          2 | 2024-03-08 11:44:28.221969 | 
    1404518341 |         60293 | E1DF599B-4A38-4DAB-A0297014F44EAC41                                            | Erigeron acris             | species        | WoRMS  |        1 |                          7 | 2024-03-08 11:44:28.221969 | 
    1404518342 |         60293 | E1DF599B-4A38-4DAB-A0297014F44EAC41                                            | Magnoliopsida              | class          | WoRMS  |        1 |                          3 | 2024-03-08 11:44:28.221969 | 
    1404518343 |         60293 | E1DF599B-4A38-4DAB-A0297014F44EAC41                                            | Erigeron acris L.          | name string    | WoRMS  |          |                            | 2024-03-08 11:44:28.221969 | 
    1404518344 |         60293 | E1DF599B-4A38-4DAB-A0297014F44EAC41                                            | Asterales                  | order          | WoRMS  |        1 |                          4 | 2024-03-08 11:44:28.221969 | 
    1404518345 |         60293 | E1DF599B-4A38-4DAB-A0297014F44EAC41                                            | Erigeron acris             | canonical name | WoRMS  |          |                            | 2024-03-08 11:44:28.221969 | 
    1404518346 |         60293 | E1DF599B-4A38-4DAB-A0297014F44EAC41                                            | Compositae                 | family         | WoRMS  |        1 |                          5 | 2024-03-08 11:44:28.221969 | 
    1404518347 |         60293 | 4F1F3E78-BE25-49A9-96D62E55C590F98A                                            | Plantae                    | kingdom        | WoRMS  |        1 |                          1 | 2024-03-08 11:44:28.221969 | 
    1404518348 |         60293 | 4F1F3E78-BE25-49A9-96D62E55C590F98A                                            | Tracheophyta               | phylum         | WoRMS  |        1 |                          2 | 2024-03-08 11:44:28.221969 | 
    1404518349 |         60293 | 4F1F3E78-BE25-49A9-96D62E55C590F98A                                            | Magnoliopsida              | class          | WoRMS  |        1 |                          3 | 2024-03-08 11:44:28.221969 | 
    1404519748 |         60293 | 138C197B-7645-4857-8A08596A2D6F0407                                            | Asterales                  | order          | WoRMS  |        1 |                          4 | 2024-03-08 11:48:25.767808 | 
    1404519749 |         60293 | 138C197B-7645-4857-8A08596A2D6F0407                                            | Compositae                 | family         | WoRMS  |        1 |                          5 | 2024-03-08 11:48:25.767808 | 
    1404519750 |         60293 | 138C197B-7645-4857-8A08596A2D6F0407                                            | Erigeron                   | genus          | WoRMS  |        1 |                          6 | 2024-03-08 11:48:25.767808 | 
    1404519751 |         60293 | 138C197B-7645-4857-8A08596A2D6F0407                                            | Erigeron pulchellus        | species        | WoRMS  |        1 |                          7 | 2024-03-08 11:48:25.767808 | 
    1404519752 |         60293 | 138C197B-7645-4857-8A08596A2D6F0407                                            | Erigeron pulchellus Michx. | name string    | WoRMS  |          |                            | 2024-03-08 11:48:25.767808 | 
    1404519753 |         60293 | E8C7B17A-7280-469A-B8087BB782C7925F                                            | Plantae                    | kingdom        | WoRMS  |        1 |                          1 | 2024-03-08 11:48:25.767808 | 
    1404519754 |         60293 | 138C197B-7645-4857-8A08596A2D6F0407                                            | Erigeron pulchellus        | canonical name | WoRMS  |          |                            | 2024-03-08 11:48:25.767808 | 
    1404519755 |         60293 | E8C7B17A-7280-469A-B8087BB782C7925F                                            | Tracheophyta               | phylum         | WoRMS  |        1 |                          2 | 2024-03-08 11:48:25.767808 | 
    1404519756 |         60293 | E8C7B17A-7280-469A-B8087BB782C7925F                                            | Erigeron                   | genus          | WoRMS  |        1 |                          6 | 2024-03-08 11:48:25.767808 | 
    1404519757 |         60293 | E8C7B17A-7280-469A-B8087BB782C7925F                                            | Magnoliopsida              | class          | WoRMS  |        1 |                          3 | 2024-03-08 11:48:25.767808 | 
    1404519758 |         60293 | E8C7B17A-7280-469A-B8087BB782C7925F                                            | Erigeron acris             | species        | WoRMS  |        1 |                          7 | 2024-03-08 11:48:25.767808 | 
    1404519759 |         60293 | E8C7B17A-7280-469A-B8087BB782C7925F                                            | Asterales                  | order          | WoRMS  |        1 |                          4 | 2024-03-08 11:48:25.767808 | 
    1404519760 |         60293 | E8C7B17A-7280-469A-B8087BB782C7925F                                            | Compositae                 | family         | WoRMS  |        1 |                          5 | 2024-03-08 11:48:25.767808 | 
    1404519761 |         60293 | E8C7B17A-7280-469A-B8087BB782C7925F                                            | Erigeron acris L.          | name string    | WoRMS  |          |                            | 2024-03-08 11:48:25.767808 | 
    1404519762 |         60293 | E8C7B17A-7280-469A-B8087BB782C7925F                                            | Erigeron acris             | canonical name | WoRMS  |          |                            | 2024-03-08 11:48:25.767808 | 
    1404519763 |         60293 | 138C197B-7645-4857-8A08596A2D6F0407                                            | Plantae                    | kingdom        | WoRMS  |        1 |                          1 | 2024-03-08 11:48:25.767808 | 
    1404519764 |         60293 | 138C197B-7645-4857-8A08596A2D6F0407                                            | Tracheophyta               | phylum         | WoRMS  |        1 |                          2 | 2024-03-08 11:48:25.767808 | 
    1404519765 |         60293 | 138C197B-7645-4857-8A08596A2D6F0407                                            | Magnoliopsida              | class          | WoRMS  |        1 |                          3 | 2024-03-08 11:48:25.767808 | 
    1404528961 |         60293 | 4462D506-7793-4B6F-B1808B4E6C4EC1DC                                            | Plantae                    | kingdom        | WoRMS  |        1 |                          1 | 2024-03-08 12:17:26.467899 | 
    1404528962 |         60293 | 4462D506-7793-4B6F-B1808B4E6C4EC1DC                                            | Tracheophyta               | phylum         | WoRMS  |        1 |                          2 | 2024-03-08 12:17:26.467899 | 
    1404528963 |         60293 | 733D93E3-B2BB-4954-90A428EC9214D61B                                            | Asterales                  | order          | WoRMS  |        1 |                          4 | 2024-03-08 12:17:26.467899 | 
    1404528964 |         60293 | 4462D506-7793-4B6F-B1808B4E6C4EC1DC                                            | Magnoliopsida              | class          | WoRMS  |        1 |                          3 | 2024-03-08 12:17:26.467899 | 
    1404528965 |         60293 | 733D93E3-B2BB-4954-90A428EC9214D61B                                            | Compositae                 | family         | WoRMS  |        1 |                          5 | 2024-03-08 12:17:26.467899 | 
    1404528966 |         60293 | 4462D506-7793-4B6F-B1808B4E6C4EC1DC                                            | Asterales                  | order          | WoRMS  |        1 |                          4 | 2024-03-08 12:17:26.467899 | 
    1404528967 |         60293 | 4462D506-7793-4B6F-B1808B4E6C4EC1DC                                            | Erigeron                   | genus          | WoRMS  |        1 |                          6 | 2024-03-08 12:17:26.467899 | 
    1404528968 |         60293 | 733D93E3-B2BB-4954-90A428EC9214D61B                                            | Erigeron                   | genus          | WoRMS  |        1 |                          6 | 2024-03-08 12:17:26.467899 | 
    1404528969 |         60293 | 4462D506-7793-4B6F-B1808B4E6C4EC1DC                                            | Compositae                 | family         | WoRMS  |        1 |                          5 | 2024-03-08 12:17:26.467899 | 
    1404528970 |         60293 | 4462D506-7793-4B6F-B1808B4E6C4EC1DC                                            | Erigeron acris             | species        | WoRMS  |        1 |                          7 | 2024-03-08 12:17:26.467899 | 
    1404528971 |         60293 | 733D93E3-B2BB-4954-90A428EC9214D61B                                            | Erigeron pulchellus        | species        | WoRMS  |        1 |                          7 | 2024-03-08 12:17:26.467899 | 
    1404528972 |         60293 | 4462D506-7793-4B6F-B1808B4E6C4EC1DC                                            | Erigeron acris L.          | name string    | WoRMS  |          |                            | 2024-03-08 12:17:26.467899 | 
    1404528973 |         60293 | 733D93E3-B2BB-4954-90A428EC9214D61B                                            | Erigeron pulchellus Michx. | name string    | WoRMS  |          |                            | 2024-03-08 12:17:26.467899 | 
    1404528974 |         60293 | 4462D506-7793-4B6F-B1808B4E6C4EC1DC                                            | Erigeron acris             | canonical name | WoRMS  |          |                            | 2024-03-08 12:17:26.467899 | 
    1404528975 |         60293 | 733D93E3-B2BB-4954-90A428EC9214D61B                                            | Erigeron pulchellus        | canonical name | WoRMS  |          |                            | 2024-03-08 12:17:26.467899 | 
    1404528976 |         60293 | 733D93E3-B2BB-4954-90A428EC9214D61B                                            | Plantae                    | kingdom        | WoRMS  |        1 |                          1 | 2024-03-08 12:17:26.467899 | 
    1404528977 |         60293 | 733D93E3-B2BB-4954-90A428EC9214D61B                                            | Tracheophyta               | phylum         | WoRMS  |        1 |                          2 | 2024-03-08 12:17:26.467899 | 
    1404528978 |         60293 | 733D93E3-B2BB-4954-90A428EC9214D61B                                            | Magnoliopsida              | class          | WoRMS  |        1 |                          3 | 2024-03-08 12:17:26.467899 | 
    1404730729 |         60293 | F5B956F4-3DE9-45EB-90EE1F97860A59A0                                            | Asterales                  | order          | WoRMS  |        1 |                          4 | 2024-03-09 08:36:39.967898 | 
    1404730730 |         60293 | 9C5B7B6A-4B43-4C9E-A1BFFE7FF6E892AC                                            | Plantae                    | kingdom        | WoRMS  |        1 |                          1 | 2024-03-09 08:36:39.967898 | 
    1404730731 |         60293 | F5B956F4-3DE9-45EB-90EE1F97860A59A0                                            | Compositae                 | family         | WoRMS  |        1 |                          5 | 2024-03-09 08:36:39.967898 | 
    1404730732 |         60293 | F5B956F4-3DE9-45EB-90EE1F97860A59A0                                            | Erigeron                   | genus          | WoRMS  |        1 |                          6 | 2024-03-09 08:36:39.967898 | 
    1404730733 |         60293 | 9C5B7B6A-4B43-4C9E-A1BFFE7FF6E892AC                                            | Tracheophyta               | phylum         | WoRMS  |        1 |                          2 | 2024-03-09 08:36:39.967898 | 
    1404730734 |         60293 | F5B956F4-3DE9-45EB-90EE1F97860A59A0                                            | Erigeron pulchellus        | species        | WoRMS  |        1 |                          7 | 2024-03-09 08:36:39.967898 | 
    1404730735 |         60293 | 9C5B7B6A-4B43-4C9E-A1BFFE7FF6E892AC                                            | Erigeron                   | genus          | WoRMS  |        1 |                          6 | 2024-03-09 08:36:39.967898 | 
    1404730736 |         60293 | 9C5B7B6A-4B43-4C9E-A1BFFE7FF6E892AC                                            | Magnoliopsida              | class          | WoRMS  |        1 |                          3 | 2024-03-09 08:36:39.967898 | 
    1404730737 |         60293 | F5B956F4-3DE9-45EB-90EE1F97860A59A0                                            | Erigeron pulchellus Michx. | name string    | WoRMS  |          |                            | 2024-03-09 08:36:39.967898 | 
    1404730738 |         60293 | 9C5B7B6A-4B43-4C9E-A1BFFE7FF6E892AC                                            | Asterales                  | order          | WoRMS  |        1 |                          4 | 2024-03-09 08:36:39.967898 | 
    1404730739 |         60293 | F5B956F4-3DE9-45EB-90EE1F97860A59A0                                            | Erigeron pulchellus        | canonical name | WoRMS  |          |                            | 2024-03-09 08:36:39.967898 | 
    1404730740 |         60293 | 9C5B7B6A-4B43-4C9E-A1BFFE7FF6E892AC                                            | Erigeron acris             | species        | WoRMS  |        1 |                          7 | 2024-03-09 08:36:39.967898 | 
    1404730741 |         60293 | 9C5B7B6A-4B43-4C9E-A1BFFE7FF6E892AC                                            | Compositae                 | family         | WoRMS  |        1 |                          5 | 2024-03-09 08:36:39.967898 | 
    1404730742 |         60293 | 9C5B7B6A-4B43-4C9E-A1BFFE7FF6E892AC                                            | Erigeron acris L.          | name string    | WoRMS  |          |                            | 2024-03-09 08:36:39.967898 | 
    1404730743 |         60293 | 9C5B7B6A-4B43-4C9E-A1BFFE7FF6E892AC                                            | Erigeron acris             | canonical name | WoRMS  |          |                            | 2024-03-09 08:36:39.967898 | 
    1404730744 |         60293 | F5B956F4-3DE9-45EB-90EE1F97860A59A0                                            | Plantae                    | kingdom        | WoRMS  |        1 |                          1 | 2024-03-09 08:36:39.967898 | 
    1404730745 |         60293 | F5B956F4-3DE9-45EB-90EE1F97860A59A0                                            | Tracheophyta               | phylum         | WoRMS  |        1 |                          2 | 2024-03-09 08:36:39.967898 | 
    1404730746 |         60293 | F5B956F4-3DE9-45EB-90EE1F97860A59A0                                            | Magnoliopsida              | class          | WoRMS  |        1 |                          3 | 2024-03-09 08:36:39.967898 | 
(108 rows)

Time: 9.415 ms
camwebb commented 1 month ago

there are a lot more columns in the loader

Thanks. I'm using the original bulkloader and it's working well. I didn't see there were 1000 records per RPM - perfect - I thought this would take days to load. 9k records now being processed.

surely the combination of name and source are unique

Not even kinda

Oops - dumb question of mine. I had forgotten that multiple classifications are allowed per name per source.

Closing as fixed. Thanks

camwebb commented 1 month ago

Sorry, gotta open this again. The classification loaded fine (and quickly), but was truncated. Pls compare classification of Spirea crenata in OP to this. Only a subset of the fields in the CSV were loaded (first 20 columns).

dustymc commented 1 month ago

Sorry something got stuck last night. It should be fixed and you should be able to reload to replace the truncated data.

camwebb commented 1 month ago

I realize now that the class_terms are at 60 but the noclass_terms are at 20. These extra slots I need for UAM Plants are all noclass_terms - that's why there was a truncation. I have split the file into three ("vertically") for the moment and that seems to work, but if you could increase the numberNoClass to 60 that would be great. Not sure the numberYesClass would ever need to be greater than 20.

camwebb commented 1 month ago

... no, that didn't work. For some reason each 1/3 of the columns cleared all the entries for the previous uploads. I.e., uploading noclass_term in columns ~90-130 deleted all the data for columns 3-89, including the class_term. I'll have to load all at once, which will require you allowing up to 60 noclass terms. Thanks in advance

dustymc commented 1 month ago

Bah, sorry, I think I see the problem, in next release.

(I'll test it if you want to pass data along - and if test has the taxa.)

cleared all the entries

Yep

This form REPLACES classifications; name-at-source will be deleted (even if there are multiple), and data in this file will be loaded into that location.

camwebb commented 3 weeks ago

@dustymc Here is the UAM Plants classification I'd like to load. If you have time, please give it a try. If not, I can wait until the next release. Thanks!

dustymc commented 3 weeks ago

@camwebb is the loader doing something it shouldn't?? The last error should have been corrected last week, if that's the concern.

camwebb commented 3 weeks ago

Oops - I hadn't tried it yet - thought the next release was a while yet. It works! :tada: See Spirea crenata for the longest entry. Fantastic. This new synonym solution is now fully working for us. Thank you!