SpeciesFileGroup / taxonworks

Workbench for biodiversity informatics.
http://taxonworks.org
Other
84 stars 25 forks source link

DwC importer assign BiocurationGroup values to rows #2424

Closed LordFlashmeow closed 2 years ago

LordFlashmeow commented 2 years ago

I have custom BiocurationGroup vocabulary, and I would like to be able to assign those tags via columns in the DwC importer.

Ideally, it would look something like this:

(other columns) TW:BiocurationGroup:lifestage TW:BiocurationGroup:sex TW:BiocurationGroup:caste TW:BiocurationGroup:subcaste
... adult female queen alate
... adult female worker soldier
... larvae male male
mjy commented 2 years ago

@LordFlashmeow I think you could pivot the tables. This is adding Tags (subclasses of actually). So ifeach column is a Keyword, and the value is a 1 if present. Something like 'TW:BiocurationClassification:`.

This should not block merge, but be a feature post merge.

LordFlashmeow commented 2 years ago
True, that could also work. However, I suspect that most curator data isn't going to be pivoted. If the data is stored as lifestage sex
adult male
larvae female
Then it requires some data manipulation knowledge to convert it to TW:BiocurationClassification:adult TW:BiocurationClassification:larvae TW:BiocurationClassification:male TW:BiocurationClassification:female
1 1
1 1

What tool would you recommend for this transformation (besides scripting)?

LocoDelAssembly commented 2 years ago

BTW in https://github.com/SpeciesFileGroup/taxonworks/commit/ff9c22afeaee132f789b3a81640426dd649e4680 I implemented what @LordFlashmeow originally posted. I hope it is better than adding multiple columns with ones and blanks.

Note that sex is mapped and will find or create "sex" BiocurationGroup and map value by finding or creating BiocurationClass. It might collide if both sex and custom TW fields are specified.

TW::BiocurationGroup:* requires both that the BiocurationGroup already exists as well as the BiocurationClassification in field value. In both, column and field value, you may match by either URI or name (URI prioritized when matching).

LordFlashmeow commented 2 years ago

What's the best way to create containers when the CO has multiple individuals with different biocuration classes? Is it to create multiple rows, one for each individual (or group of individuals) with the "Containerize specimen with existing ones when catalog number already exists" setting enabled?

mjy commented 2 years ago

Yes, I believe so. They will be grouped by a Container::Virtual. I'm not sure if prep type also influences when it matches a specific container type.

mjy commented 2 years ago

Side note that the export behaviour has changed to export via a match to the group that groups the attribute. So you can have individual attributes load from the importer, then make them export to a specific DwC field for lifeStage and Sex.

mjy commented 2 years ago

@LordFlashmeow how's the status of this issue?

LordFlashmeow commented 2 years ago

@mjy I don't think we're planning to use this in the near future. Our data is too variable and inconsistent, and we don't think containerizing benefits our dataset.