NHMDenmark / Mass-Digitizer

Common repo for the DaSSCo team
Apache License 2.0
1 stars 0 forks source link

Test data export mapping to Specify import #157

Closed PipBrewer closed 1 year ago

PipBrewer commented 2 years ago

Issue

Sorting out issues prior to implementing the Specify import process.

Map an export from the App to Specify to check that the data is in an appropriate field and check any required modifications needed to Specify UIs. Do this for NHMD Vascular Plants.

PipBrewer commented 1 year ago

Having trouble finishing this at the moment due to continuing issues with testing on my computer (can't find specimen data I've entered). Can't see what ID codes refer to for collection and institution Need verbatim taxa in notes if no family in taxon spine Need to sort out verbatim fields in Specify UI to map to Need to decide how to put multispecimen sheets into Specify so can map column to Where to map digitiser to - can this go to modified by agent etc?

PipBrewer commented 1 year ago

Issues for Fedor to solve:

FedorSteeman commented 1 year ago
PipBrewer commented 1 year ago

@FedorSteeman Can you book a time to chat as you've misunderstood at least one of the issues

PipBrewer commented 1 year ago

ID CODES: If you look in the specimen table in the App, there are 2 columns. One is called InstitutionID and one called CollectionID. There are no descriptor columns to explain what these two ID relate to. Hence, if doing an export of the specimen table and preparing it for import into Specify, it is not easy to match the records to the correct Institution or Collection. Therefore, please can we add two extra columns, one explaining what the InstitutionID relates to and one detailing what collectionid relates to?

PipBrewer commented 1 year ago

VERBATIM FIELDS: But the notes may relate to multiple bits of information. We need a general notes table that is not related to something like determination specifically. This would be for example where the current OCR transcription field has been put in Specify. In fact, the two could be merged.

PipBrewer commented 1 year ago

Digitizer I supposed these should be mapped to Cataloger and Cataloged date/time - great. Could you write some instructions detailing how to do this for an import?

PipBrewer commented 1 year ago

Multispecimen Sheets Let's discuss this. It would be good if you could propose a model for how you think this would work in Specify.

FedorSteeman commented 1 year ago

ID CODES: If you look in the specimen table in the App, there are 2 columns. One is called InstitutionID and one called CollectionID. There are no descriptor columns to explain what these two ID relate to. Hence, if doing an export of the specimen table and preparing it for import into Specify, it is not easy to match the records to the correct Institution or Collection. Therefore, please can we add two extra columns, one explaining what the InstitutionID relates to and one detailing what collectionid relates to?

OK. We could include the institution and collection names, but I'm not sure what these should be mapped to, since when using Workbench, you're already logged in to the institution & collection in question. But it would provide clarity in order to show what institution and collection to log in to in the first place.

These columns should not be mapped to the storage-related fields, because "collection" in this context relates to the "Specify collection" and not the "collection" you would find in the storage tree. These are different concepts.

Come to think of it, we would actually really need fields denoting the parentage of the storage "leaves", otherwise Workbench would not know exactly know where to put it. This information is in the fullname field, but this either needs to be split manually before import or be readily split for your convenience.

NOTE: The ids in the specimen table are only local foreign keys and not Specify foreign keys BTW.

FedorSteeman commented 1 year ago

VERBATIM FIELDS: But the notes may relate to multiple bits of information. We need a general notes table that is not related to something like determination specifically. This would be for example where the current OCR transcription field has been put in Specify. In fact, the two could be merged.

We cannot add tables to Specify on the fly. Specify has a set number of tables that. however, do allow for customization, but mainly through the customizable fields, like "text1", "text2" etc. These will be able to encompass multiple bits of information for each table/object-class/domain-model. Things like OCR transcription fields could live on the level of collectionobject even if they denote subordinate objects.

FedorSteeman commented 1 year ago

Digitizer I supposed these should be mapped to Cataloger and Cataloged date/time - great. Could you write some instructions detailing how to do this for an import?

We would need to specify the name of the digitizer in separate Lastname, Middle and Firstname fields, so Workbench can parse and identify the corresponding agent. This information can be fetched upon user login and we can make sure this also is added to the specimen table.

PipBrewer commented 1 year ago

ID CODES: If you look in the specimen table in the App, there are 2 columns. One is called InstitutionID and one called CollectionID. There are no descriptor columns to explain what these two ID relate to. Hence, if doing an export of the specimen table and preparing it for import into Specify, it is not easy to match the records to the correct Institution or Collection. Therefore, please can we add two extra columns, one explaining what the InstitutionID relates to and one detailing what collectionid relates to?

OK. We could include the institution and collection names, but I'm not sure what these should be mapped to, since when using Workbench, you're already logged in to the institution & collection in question. But it would provide clarity in order to show what institution and collection to log in to in the first place.

These columns should not be mapped to the storage-related fields, because "collection" in this context relates to the "Specify collection" and not the "collection" you would find in the storage tree. These are different concepts.

Come to think of it, we would actually really need fields denoting the parentage of the storage "leaves", otherwise Workbench would not know exactly know where to put it. This information is in the fullname field, but this either needs to be split manually before import or be readily split for your convenience.

NOTE: The ids in the specimen table are only local foreign keys and not Specify foreign keys BTW.

Yep, I know they are local keys. We need to know which collections and institutions to log into to import, but also to collect statistics, so yes, we need these columns adding to. We will be saving all record exports from all institutions-collections in same place. Well aware that the collection relates to the Specify collection. Well aware that this is nothing to do with storage. We have the full storage name in the specimen table, so that is not a problem.

FedorSteeman commented 1 year ago

Multispecimen Sheets Let's discuss this. It would be good if you could propose a model for how you think this would work in Specify.

The most obvious thing would be Specify Containers. Even though these are not supported yet by the Specify7 interface, it is supported at the database level (and Specify6) and probably also Sp7 Workbench.

Of course, each object on a Multispeciment sheet would need a separate catalogue number.

FedorSteeman commented 1 year ago

Shall I create tickets for those requirements that have been clarified?

PipBrewer commented 1 year ago

VERBATIM FIELDS: But the notes may relate to multiple bits of information. We need a general notes table that is not related to something like determination specifically. This would be for example where the current OCR transcription field has been put in Specify. In fact, the two could be merged.

We cannot add tables to Specify on the fly. Specify has a set number of tables that. however, do allow for customization, but mainly through the customizable fields, like "text1", "text2" etc. These will be able to encompass multiple bits of information for each table/object-class/domain-model. Things like OCR transcription fields could live on the level of collectionobject even if they denote subordinate objects.

Not asking for tables to be added "on the fly". Asking if this can be implemented by looking at what customisable fields are available at the level of collection object.

PipBrewer commented 1 year ago

Multispecimen Sheets Let's discuss this. It would be good if you could propose a model for how you think this would work in Specify.

The most obvious thing would be Specify Containers. Even though these are not supported yet by the Specify7 interface, it is supported at the database level (and Specify6) and probably also Sp7 Workbench.

Of course, each object on a Multispeciment sheet would need a separate catalogue number.

I agree that containers are the best solution. Need to understand what the process is for creating these when these are identified in the app.

PipBrewer commented 1 year ago

Digitizer I supposed these should be mapped to Cataloger and Cataloged date/time - great. Could you write some instructions detailing how to do this for an import?

We would need to specify the name of the digitizer in separate Lastname, Middle and Firstname fields, so Workbench can parse and identify the corresponding agent. This information can be fetched upon user login and we can make sure this also is added to the specimen table.

Sounds good. Can we implement that?

FedorSteeman commented 1 year ago

VERBATIM FIELDS: But the notes may relate to multiple bits of information. We need a general notes table that is not related to something like determination specifically. This would be for example where the current OCR transcription field has been put in Specify. In fact, the two could be merged.

We cannot add tables to Specify on the fly. Specify has a set number of tables that. however, do allow for customization, but mainly through the customizable fields, like "text1", "text2" etc. These will be able to encompass multiple bits of information for each table/object-class/domain-model. Things like OCR transcription fields could live on the level of collectionobject even if they denote subordinate objects.

Not asking for tables to be added "on the fly". Asking if this can be implemented by looking at what customisable fields are available at the level of collection object.

We could assign sets of customizable fields to specific concepts. That would provide clarity on both ends.

PipBrewer commented 1 year ago

VERBATIM FIELDS: But the notes may relate to multiple bits of information. We need a general notes table that is not related to something like determination specifically. This would be for example where the current OCR transcription field has been put in Specify. In fact, the two could be merged.

We cannot add tables to Specify on the fly. Specify has a set number of tables that. however, do allow for customization, but mainly through the customizable fields, like "text1", "text2" etc. These will be able to encompass multiple bits of information for each table/object-class/domain-model. Things like OCR transcription fields could live on the level of collectionobject even if they denote subordinate objects.

Not asking for tables to be added "on the fly". Asking if this can be implemented by looking at what customisable fields are available at the level of collection object.

We could assign sets of customizable fields to specific concepts. That would provide clarity on both ends.

We need something at the level of collection object, not e.g., within determinations. Can we do that?

FedorSteeman commented 1 year ago

Digitizer I supposed these should be mapped to Cataloger and Cataloged date/time - great. Could you write some instructions detailing how to do this for an import?

We would need to specify the name of the digitizer in separate Lastname, Middle and Firstname fields, so Workbench can parse and identify the corresponding agent. This information can be fetched upon user login and we can make sure this also is added to the specimen table.

Sounds good. Can we implement that?

https://github.com/NHMDenmark/Mass-Digitizer/issues/217

PipBrewer commented 1 year ago

VERBATIM FIELDS: But the notes may relate to multiple bits of information. We need a general notes table that is not related to something like determination specifically. This would be for example where the current OCR transcription field has been put in Specify. In fact, the two could be merged.

We cannot add tables to Specify on the fly. Specify has a set number of tables that. however, do allow for customization, but mainly through the customizable fields, like "text1", "text2" etc. These will be able to encompass multiple bits of information for each table/object-class/domain-model. Things like OCR transcription fields could live on the level of collectionobject even if they denote subordinate objects.

Not asking for tables to be added "on the fly". Asking if this can be implemented by looking at what customisable fields are available at the level of collection object.

We could assign sets of customizable fields to specific concepts. That would provide clarity on both ends.

We need something at the level of collection object, not e.g., within determinations. Can we do that?

A digitiser puts something in the notes field. This could be related to multiple things, determination, geographic region, condition of specimen. When importing to Specify this would really need to be mapped to a single field at collection object level - otherwise we effectively need multiple columns and the digitiser to specify at the time which column to map to, or it will be a lot of work to sort out during import.

FedorSteeman commented 1 year ago

VERBATIM FIELDS: But the notes may relate to multiple bits of information. We need a general notes table that is not related to something like determination specifically. This would be for example where the current OCR transcription field has been put in Specify. In fact, the two could be merged.

We cannot add tables to Specify on the fly. Specify has a set number of tables that. however, do allow for customization, but mainly through the customizable fields, like "text1", "text2" etc. These will be able to encompass multiple bits of information for each table/object-class/domain-model. Things like OCR transcription fields could live on the level of collectionobject even if they denote subordinate objects.

Not asking for tables to be added "on the fly". Asking if this can be implemented by looking at what customisable fields are available at the level of collection object.

We could assign sets of customizable fields to specific concepts. That would provide clarity on both ends.

We need something at the level of collection object, not e.g., within determinations. Can we do that?

A digitiser puts something in the notes field. This could be related to multiple things, determination, geographic region, condition of specimen. When importing to Specify this would really need to be mapped to a single field at collection object level - otherwise we effectively need multiple columns and the digitiser to specify at the time which column to map to, or it will be a lot of work to sort out during import.

Of course! The collection object has plenty of customizable fields.

FedorSteeman commented 1 year ago

Multispecimen Sheets Let's discuss this. It would be good if you could propose a model for how you think this would work in Specify.

The most obvious thing would be Specify Containers. Even though these are not supported yet by the Specify7 interface, it is supported at the database level (and Specify6) and probably also Sp7 Workbench. Of course, each object on a Multispeciment sheet would need a separate catalogue number.

I agree that containers are the best solution. Need to understand what the process is for creating these when these are identified in the app.

I've been playing around with containers for the Ichthyology import. I think we can suffice with a container name. So just an extra text column. This could be auto-generated or we can add an input field for specifying the container name.

PipBrewer commented 1 year ago

VERBATIM FIELDS: But the notes may relate to multiple bits of information. We need a general notes table that is not related to something like determination specifically. This would be for example where the current OCR transcription field has been put in Specify. In fact, the two could be merged.

We cannot add tables to Specify on the fly. Specify has a set number of tables that. however, do allow for customization, but mainly through the customizable fields, like "text1", "text2" etc. These will be able to encompass multiple bits of information for each table/object-class/domain-model. Things like OCR transcription fields could live on the level of collectionobject even if they denote subordinate objects.

Not asking for tables to be added "on the fly". Asking if this can be implemented by looking at what customisable fields are available at the level of collection object.

We could assign sets of customizable fields to specific concepts. That would provide clarity on both ends.

We need something at the level of collection object, not e.g., within determinations. Can we do that?

A digitiser puts something in the notes field. This could be related to multiple things, determination, geographic region, condition of specimen. When importing to Specify this would really need to be mapped to a single field at collection object level - otherwise we effectively need multiple columns and the digitiser to specify at the time which column to map to, or it will be a lot of work to sort out during import.

Of course! The collection object has plenty of customizable fields.

The other issue of course, is cannot it be a table? Is there a possibility for this is what I'm asking - i.e., is there a table option to be customised? Is that what you are saying can't be done? So, for example, we have a note from the digi app, we get an OCR read during imaging and later a transcriber adds something. How do we separate and keep all of this info? A notes table would be good at collections object level. Otherwise we end up with endless numbers of highly specific notes fields which likely contain only temporary information (as sorted out by the curator post-digitisation). If this is not possible, is there a clever alternative?

FedorSteeman commented 1 year ago

The other issue of course, is cannot it be a table? Is there a possibility for this is what I'm asking - i.e., is there a table option to be customised? Is that what you are saying can't be done? So, for example, we have a note from the digi app, we get an OCR read during imaging and later a transcriber adds something. How do we separate and keep all of this info? A notes table would be good at collections object level. Otherwise we end up with endless numbers of highly specific notes fields which likely contain only temporary information (as sorted out by the curator post-digitisation). If this is not possible, is there a clever alternative?

OK, there may be a misunderstanding here. When you write "table", I'm thinking of a database table, i.e. an actual piece of the structure of a relational database system, which is pretty rigid. If you mean a tabular format for data, so an entire "table" of information that exists for a single record, I think there are options for that too. For instance, we could simply throw a CSV text dump in a single text field.

PipBrewer commented 1 year ago

Multispecimen Sheets Let's discuss this. It would be good if you could propose a model for how you think this would work in Specify.

The most obvious thing would be Specify Containers. Even though these are not supported yet by the Specify7 interface, it is supported at the database level (and Specify6) and probably also Sp7 Workbench. Of course, each object on a Multispeciment sheet would need a separate catalogue number.

I agree that containers are the best solution. Need to understand what the process is for creating these when these are identified in the app.

I've been playing around with containers for the Ichthyology import. I think we can suffice with a container name. So just an extra text column. This could be auto-generated or we can add an input field for specifying the container name.

There are possibly two solutions here I can think of. 1. Another pop up box (ugh) asking for you to scan the barcodes of the other specimens on the sheet, or 2. We assume that all specimens added consecutively with multispecimen sheet ticked are from same sheet. 2. would involve no extra app development. How would the process for 2 work? Could do it manually after the fact in specify by manually creating containers. Alternatives? Do something funky during import? What if there are mutiple multispecimen sheets one after the other?

PipBrewer commented 1 year ago

The other issue of course, is cannot it be a table? Is there a possibility for this is what I'm asking - i.e., is there a table option to be customised? Is that what you are saying can't be done? So, for example, we have a note from the digi app, we get an OCR read during imaging and later a transcriber adds something. How do we separate and keep all of this info? A notes table would be good at collections object level. Otherwise we end up with endless numbers of highly specific notes fields which likely contain only temporary information (as sorted out by the curator post-digitisation). If this is not possible, is there a clever alternative?

OK, there may be a misunderstanding here. When you write "table", I'm thinking of a database table, i.e. an actual piece of the structure of a relational database system, which is pretty rigid. If you mean a tabular format for data, so an entire "table" of information that exists for a single record, I think there are options for that too. For instance, we could simply throw a CSV text dump in a single text field.

OK. I think we are almost on the same page now with this tabular format. :)

FedorSteeman commented 1 year ago

image

PipBrewer commented 1 year ago

image

PipBrewer commented 1 year ago

Whatever makes you happy. I'm OK with them in this ticket as multiple items or separate tickets.

FedorSteeman commented 1 year ago

These are clearly separate work units, so it would make me really happy if they're turned into respective tickets.

PipBrewer commented 1 year ago

These are clearly separate work units, so it would make me really happy if they're turned into respective tickets.

Your happiness, is my happiness...

FedorSteeman commented 1 year ago

There are possibly two solutions here I can think of. 1. Another pop up box (ugh) asking for you to scan the barcodes of the other specimens on the sheet, or 2. We assume that all specimens added consecutively with multispecimen sheet ticked are from same sheet. 2. would involve no extra app development. How would the process for 2 work? Could do it manually after the fact in specify by manually creating containers. Alternatives? Do something funky during import? What if there are mutiple multispecimen sheets one after the other?

I think the most straightforward way, would be an auto-generated name based on the moment the multispecimen checkbox is ticked. This would be added to a (uneditable) text field, so it's clear what container is worked with. The auto-generated name I used for Ichthyology is the first cataloguenr-dash-last cataloguenr e.g. "02114-02115"

We could add an option of a user changing the name, but this could quickly become messy: What if the user decides to want to change the name halfway through? This would then require "containers" to exist in a local database table and requiring management. Better to keep it simple, stupid rigid and train the users to not jump around too much, but record all specimens on a single sheet in one go, or do not.

PipBrewer commented 1 year ago

There are possibly two solutions here I can think of. 1. Another pop up box (ugh) asking for you to scan the barcodes of the other specimens on the sheet, or 2. We assume that all specimens added consecutively with multispecimen sheet ticked are from same sheet. 2. would involve no extra app development. How would the process for 2 work? Could do it manually after the fact in specify by manually creating containers. Alternatives? Do something funky during import? What if there are mutiple multispecimen sheets one after the other?

I think the most straightforward way, would be an auto-generated name based on the moment the multispecimen checkbox is ticked. This would be added to a (uneditable) text field, so it's clear what container is worked with. The auto-generated name I used for Ichthyology is the first cataloguenr-dash-last cataloguenr e.g. "02114-02115"

We could add an option of a user changing the name, but this could quickly become messy: What if the user decides to want to change the name halfway through? This would then require "containers" to exist in a local database table and requiring management. Better to keep it simple, stupid rigid and train the users to not jump around too much, but record all specimens on a single sheet in one go, or do not.

Really need a system that the collection managers and curators are happy with long term for names of containers. Are they automatically assigned a unique number in Specify? Could do with some options to present to them.

FedorSteeman commented 1 year ago

Really need a system that the collection managers and curators are happy with long term for names of containers. Are they automatically assigned a unique number in Specify? Could do with some options to present to them.

Once they're in Specify the container names can always be changed at the curator's whim and fancy. Provided of course that Specify7 eventually will support containers in the user interface.

EDIT: They're not assigned unique names or numbers in Specify (other than database level primary keys). When you create a container, you give it a name.

FedorSteeman commented 1 year ago

The following have all be relegated to dedicated tickets:

What remains is facilitating a test import into Specify.

PipBrewer commented 1 year ago

Subsumed into other tasks