[Bug/Feature request] Need more noclass_term fields in cf_temp_classification #7971

closed 1 month ago

1 month ago

I'm trying to load the new UAM Plants classification from WFO. It has 57 noclass_term and the bulkloader is failing with the error: 'column "noclass_term_type_21" of relation "cf_temp_classification" does not exist', so I assume the cf_temp_classification table only has 20 noclass_term. I request 60 noclass_term please.

Now why, you are asking, do we need so many? Here's the reason: Spiraea crenata. In WFO there are 11 different variations of the name. Here's (part of) the single line of bulkloader for the name (split onto lines for readability):

  Spiraea crenata
  UAM Plants
  kingdom : Plantae
  phylum : Tracheophyta
  family : Rosaceae
  genus : Spiraea
  species : Spiraea crenata
  subspecies : 
  variety : 
  forma : 
  classification source : World Flora Online v.2023.12
  managed_by : camwebb@Arctos
  1_fullname : "Spiraea crenata Pall."
  1_wfoID : wfo-0000983998
  1_status : Synonym
  1_synonym_of_wfoID : wfo-0000985818
  1_synonym_of_name : "Spiraea hypericifolia L."
  2_fullname : "Spiraea crenata L."
  2_wfoID : wfo-0000985813
  2_status : Accepted
  2_synonym_of_wfoID :
  2_synonym_of_name : ""
  ... (8 more variations snipped) ...
  11_fullname : "Spiraea crenata auct."
  11_wfoID : wfo-0001011545
  11_status : Synonym
  11_synonym_of_wfoID : wfo-0000985818
  11_synonym_of_name : "Spiraea hypericifolia L."

There's no great hurry, but the file is ready to load. It has 9766 names. See repo for the build script.


1 month ago

A potentially better alternative solution would be to rewrite the import script/table to accept a bulkloader file with single term per line:

Spiraea crenata,UAM Plants,yes,genus,Spiraea
Spiraea crenata,UAM Plants,no,11_synonym_of_wfoID,wfo-0000985818

... just thinking aloud

1 month ago

Changing from bug to feature request and going active on this, it's completely additive, don't think there's any reason for discussion.

why, you are asking

While wearing Arctos Developer hat: I'm not asking, classifications are only shared if someone wants to buy into them, you've got your own, do whatever you want, it only affects Arctos by adding a bit of entirely-optional functionality. (There's an impact on performance and such, but I think all of these requests are 'drop in the bucket' in the big picture so I'm ignoring that aspect.)

OK, hats off: Sure, there's a giant mess in taxonomy-land, and a taxonomy database-thingee has done what they can to record that, good for them, but WHY would anyone cataloging plants want to deal with that? Probably your collection only thinks of one thing, not 11, when someone mentions Spiraea crenata, so why bring the noise onboard? It would be fabulous if you could answer in the form of a how-to, whatever you're up to is probably of interest to others. (Or feel free not to answer at all!)

Also - and do please feel entirely free to ignore this - "partial identifiers" are a product of the devil himself, from your example ( I would transform

1_synonym_of_wfo : <a href="">wfo-0000009039</a>


1_synonym_of_wfo : <a href=""></a>

so that nobody ever has to guess what a 'wfo-number' might be, then I'd add a bare term to better support search


Or possibly you could just provide the bare URI/GUID and request UI magic to linkify it (I would).

alternative solution

I'm not COMPLETELY attached to anything (and definitely like data-driven built-in expansion), but

  1. Most people aren't going to be able to successfully use that, while anything will spit out CSV
  2. It would have to do something weird for NULL classification term types, and
  3. It would have to use some delimiter which can't appear in the data

If we really need to go beyond CSV, we could just write to the table, which would mean a need for more organization and unique identifiers and such, but it's simple structure and normalized data is always nice...

1 month ago

Probably your collection only thinks of one thing, not 11, when someone mentions Spiraea crenata, so why bring the noise onboard?

Two reasons: i) it would be a huge/impossible amount of work to look at all the labels for each identification, list the author strings actually used, and constrain the WFO 'everything' data to what we have; ii) We cannot know if a user will want to find one of out specimens based on a synonym we are not using or aware of: having the exhaustive WFO synonymy helps here.

It would be fabulous if you could answer in the form of a how-to

I'll bring this up at the next @ArctosDB/taxonomy meeting. I was planning to share this approach when I had the classification ready, and now I do. If people like this approach, I'll write a how to.

"partial identifiers"

:+1: Good suggestion

beyond CSV

Would you be open to or even prefer I make the classification import into a SQL INSERT which you could run? Would taxon_name, classification_id, term, term_type be sufficient, or would I need to calculate taxon_name_id and position_in_classification too?

1 month ago

open to


Maybe not 'prefer' because it ties you to someone with a scary password being available and removes some independence, but I don't mind pushing buttons either.


Yep, I can get taxon_name_id from that, no problem.


You'd have to supply that, it's the thing that brings multiple rows back together, don't think there's any way I can calculate it. It just needs to be unique (a constraint which can't be actually constrained, so violations will result in some sort of super-weird UI, probably, not an error) - use a UUID or similar (something very probably unique, don't start with "1"!) and it'll (probably) be fine.


You definitely would have to calculate that, it's what differentiates "classification terms" from "nonclass terms" (and organizes the former).

Maybe this is helpful:

1 month ago

1 month ago

1 month ago

there are a lot more columns in the loader

Thanks. I'm using the original bulkloader and it's working well. I didn't see there were 1000 records per RPM - perfect - I thought this would take days to load. 9k records now being processed.

surely the combination of name and source are unique

Not even kinda

Oops - dumb question of mine. I had forgotten that multiple classifications are allowed per name per source.

Closing as fixed. Thanks

1 month ago

Sorry, gotta open this again. The classification loaded fine (and quickly), but was truncated. Pls compare classification of Spirea crenata in OP to this. Only a subset of the fields in the CSV were loaded (first 20 columns).

1 month ago

Sorry something got stuck last night. It should be fixed and you should be able to reload to replace the truncated data.

1 month ago

I realize now that the class_terms are at 60 but the noclass_terms are at 20. These extra slots I need for UAM Plants are all noclass_terms - that's why there was a truncation. I have split the file into three ("vertically") for the moment and that seems to work, but if you could increase the numberNoClass to 60 that would be great. Not sure the numberYesClass would ever need to be greater than 20.

1 month ago

... no, that didn't work. For some reason each 1/3 of the columns cleared all the entries for the previous uploads. I.e., uploading noclass_term in columns ~90-130 deleted all the data for columns 3-89, including the class_term. I'll have to load all at once, which will require you allowing up to 60 noclass terms. Thanks in advance

1 month ago

Bah, sorry, I think I see the problem, in next release.

(I'll test it if you want to pass data along - and if test has the taxa.)

cleared all the entries


This form REPLACES classifications; name-at-source will be deleted (even if there are multiple), and data in this file will be loaded into that location.

3 weeks ago

@dustymc Here is the UAM Plants classification I'd like to load. If you have time, please give it a try. If not, I can wait until the next release. Thanks!

3 weeks ago

@camwebb is the loader doing something it shouldn't?? The last error should have been corrected last week, if that's the concern.

3 weeks ago

Oops - I hadn't tried it yet - thought the next release was a while yet. It works! :tada: See Spirea crenata for the longest entry. Fantastic. This new synonym solution is now fully working for us. Thank you!