Useful information from old emails

andrewvanbreda commented 8 years ago

This information was provided in email conversations prior to the move to Github for conversation tracking. For simplicity, elements such as information in email footers and email “niceties” have been removed from the transcript. In some places, to clean up the information, I have made edits, or added additional information and these more major edits are denoted by “avb edit” I have also reversed the order of a typical email thread, so that the newest information is at the bottom and as such the information can be read from the top without confusion.

Nested in #46

andrewvanbreda commented 8 years ago

(avb edit: Email headers removed. Original email sent by Andrew van Breda on 07 January 2016, responses by David Roy on 9th January if you wish to locate the original emails. I have indicated John’s response with “JVB:” and David’s with “DR:” Andrew van Breda wrote)

Here are some questions relating to the PLANTATT data. 1. The PLANTATT_19_Nov_08.xls spreadsheet I am looking at has the attributes listed against taxon name. A better result would probably be achieved if these were listed against exact tvks. Will those be available? I have checked and it appears to be the same in the NVC floristic tables spreadsheet. DR: Good point. We can supply both linked to TVKs

2. Are we going to be just storing the attributes against preferred species like with NPMS? DR: Yes. Presumably the import/entry of species will link synonyms to preferred species? What happens if we update the UKSI? Will the link to preferred species be retained?

3. I noticed some of the traits (such as Perennation, Life Form, Origin and Woodiness) are just listed as a code (e.g. "Th"). Are these the exact values we are going to have in the termlists, or will these codes be expanded upon in the termlist itself? Looking at the data further, it looks like Clonal spread has probably got more complex codes to my untrained eye, however the same question applies.

DR: Yes, there is a full termlist. In fact, this data is contained within the 'Online Plant Atlas' site that John developed - https://www.brc.ac.uk/plantatlas/index.php?q=plant/hyacinthoides-non- scripta. We are effectively aiming to replicate this functionality.

4. I wasn't sure if the rows "(c) Geography & climate" and "(d) Habitat" on the "Attributes and Sources" tab were actually supposed to relate to any columns on the data tab.

DR: These are just sub-heads but usefully group the attributes, e.g. tabs on https://www.brc.ac.uk/plantatlas/index.php?q=plant/hyacinthoides-non-s cripta

5. The "Continentality in Europe" column "C" seems to only hold two states, it is either "c" or blank. Does this mean this should actually be stored as a boolean? It is the same with Coastal column "Co", which only seems to contain "Co" or blank, do we literally need to store "Co" or is it just going to be a Boolean?

_JVB: Yes, I think these are Booleans, David can confirm.

DR: Yes, booleans_

6. The "Reaching northern European limit in British Isles" NBI column appears to be black, 0 or 1. Is blank the same a 0?. Should this be stored as a boolean? The same question applies for "SBI" "Reaching southern European limit in British Isles".

_JVB: I suspect blank means null (i.e. not recorded) and the 1&0 then map to a true/false boolean.

DR: Yes, that's correct_

7. How should we store the "Latitude of northern European limit (5º band)" "NEur" values, as these seem to be a series of ranges like >65 or 50-55. For instance, should these be stored as two integer values? (if the upper one is missing it could mean "greater than", and the opposite for the "less than" situation). The same question applies for "SEur" "Latitude of southern European limit (5º band)"

DR: Text is fine for these. They are only for reporting so not used as numbers

8. This question isn't really a programming one, but is useful for me to understand the system better. I am not sure how the following traits relate to species? January mean temperature July mean temperature Annual precipitation

_JVB: David can clarify. I think that this means if you look at the area covered by the population of this species, what's the average temperature in Jan/June etc.

DR: It give a measure of whether a plant tends to grow in wet/hot places and is calculated as:

The July Mean Temperature values (ºC) were calculated as the mean value of the 10-km squares where the species occurs in Britain, Ireland and the Channel Islands. Climate data for 10-km squares were taken from baseline climate summaries of the UK Climate Impacts Programme (Hulme & Jenkins, 1998). These baseline summaries were constructed by interpolation of daily weather measurements from individual met stations, averaged over the 30-year period 1961-1990 (Barrow et al., 1993)._

AVB: Now some questions relating to the NVC florstic tables data.

9. The "Community or sub-comminity code" and "Community level code" data looks almost identical to me, will we be storing both of these?

DR: Yes, very similar but not identical. We'll need both imported.

10. The "Species constancy value" looks to me to be a roman numeral. Is this going to be stored as a roman numeral type character in a termlist, or are we going to store this data as an integer? DR: Store as an integer to enable subsequent calculations potentially

11. Am not 100% sure how the NVC_florstic associations work? Is this supposed to associate species with each other?. Looking at the data I am not sure how this works.

_DR: For a given vegetation community, they define the reference type - a target. The constancy value tells you that if you take x numbers of samples from a site, a value of 5 (V) would suggest that 80-100% of those samples should contain that species. They are used by some algorithms (e.g. MAVIS mentioned in the Spec) to then classify a set of plant records (from a quadrat, e.g. 1m x 1m) to tell you what vegetation community you have. The NVC is the basis of much plant conservation work - judging the quality of sites and guiding management.

For the plant portal, they are useful for tagging against a species list for reporting.

AVB: Will await a copy of the "Importing species association data - DBIF" and "example plot data" before commenting on those._

andrewvanbreda commented 8 years ago

Note: Numbering seems to have become broken when above comment was posted. Ignore question numbering.

andrewvanbreda commented 8 years ago

(avb edit: Email headers removed. Original email sent by Andrew van Breda on 13 January 2016 if you wish to locate the original emails.

In answer to your question about the preferred species. As I mentioned, the data we are going to import for the Plant Portal doesn't currently have TVKs, so for this answer I will assume those will be present eventually like for Pantheon.

At the time of import, the traits are imported into the taxa taxon list item indicated by the preferred tvk in the import data.

The traits stay linked to that taxa taxon list item even if the preferred species changes (in fact, the importer doesn't check the Preferred flag in the Indicia database, it just assumes that the preferred_tvk in the import data correctly points to the preferred item).

I think you are right, this is something that needs consideration.

I would also note that pantheon traits will have the same issue.

I can that it may be problematic to re-assign the traits automatically. For instance, the system would need to know that some of the attributes might not be pantheon/plant portal traits and as such should not be reassigned. Also, in the situation where the master list is changed, how is this achieved in Indicia, are the changes imported, or does someone use the Warehouse user interface? We might need to cope with all the ways it can be done.

Also how often does the master list change in this kind of way? How often it happens my have an impact on how we approach a solution.

I can think of three possible solutions off the top of my head, I suppose the more correct solution would be to re-assign all the traits to the preferred item, however, as I mentioned this may be difficult. Another thought I have is to leave the traits attached to the old preferred species, but alter the code that looks up the traits so that if it doesn't find any, then it should check to see if the traits are attached to the synonyms and then use those. In some ways I suppose this is a bit of a "fudge", but it might be more elegant, simpler and less liable to failure. It might be ok if this doesn't happen very often.

I suppose one other possibility is that if this situation only happens very rarely, and if the change is made in the Warehouse user interface, then we could warn the user that the traits need re-assigning and give some manually method of doing that.

Anyway, those are the solutions I can think of. Let me know your thoughts.

(avb edit: Email headers removed. Original email sent by David Roy on 13 January 2016 if you wish to locate the original email)

Thanks Andrew. Perhaps this should be dealt with as part of the process for updating the master list, UKSI? Extending that process - which happens through code that John developed and is part of the warehouse code I believe.

Your suggestion of using the taxon_meaning_id to link to other names that might be tagged with traits also seems ok. The other thing to consider as regards your import procedure is a check that when loading against the preferred name that trait values are not already assigned to synonyms, and if they are that they are not different.

But I don't see any of this as a particular problem to solve immediately.

(avb edit: Email headers removed. Original email sent by Andrew van Breda on 26th January 2016 if you wish to locate the original email)

I have a couple of queries about this data.

Firstly, I have noticed that two of the tvks actually have multiple rows ("NHMSYS0021060376" AND "NBNSYS0000002167") Are you wanting me to pull across all the data? or just data from one of the rows for each tvk? (it doesn't look like the duplicate tvk rows have the same data in).

Which brings me to my second question, and I apologise if it is a silly one, but I made the assumption that what you refer to as being the "recommended" tvk in the spreadsheet is just the same as what I always call a "preferred" tvk. Can you confirm this for me? It is just that having made that assumption, I would have thought the taxon name displayed next to those keys I just mentioned would be identical between the different rows, however they are not.

(avb edit: Version of PlantAt data provided by David Roy in an email)

(avb edit: Email headers removed. Original email sent by David Roy on 26 January 2016 if you wish to locate the original email) Well spotted. I've just reviewed the dataset. The problem comes from nomenclature changes. To make an executive decision please do the following:

Remove the row for Zostera angustifolia
The TVK for Asparagus officinalis subsp.officinalis should be NBNSYS0000002168

I think for our purposes, the recommended TVK and preferred TVK can be considered the same.

(avb edit: Email headers removed. Original email sent by Andrew van Breda on 27 January 2016 if you wish to locate the original email) I have a different query about the import data, I have noticed that some of the data is a question mark in the column.

I have noticed this in the columns for E1,E2 on a few of the taxa such as "Utricularia ochroleuca"

This looks to me that otherwise it would actually be an integer column.

Can you let me know what you want the importer to do in this scenario. Until I hear back, I will program it to act as if there is no data if it comes across a question mark.

(avb edit: Email headers removed. Original email sent by David Roy on 27 January 2016 if you wish to locate the original email) Yes, these can be treated as nulls so the data can be stored as a number field.

(avb edit: Email headers removed. Original email sent by Andrew van Breda on 29 January 2016 if you wish to locate the original email) I have just been looking at that link to the plant atlas website and trying to work out the termlists for the PlantAtt import.

For some it is simple e.g. Perennation (P1,P2) - a,b,p (annual, biennial or perennial)

Same Woodiness is obvious.

However looking at the website, I cannot see any definitions for Origin of alien taxa at all, and although the Clonal Spread has the list of options in the page info help, it doesn't include the codes, so I can't see how they line up

e.g. for Clonal Spread we have Rhizz1, Rhizz2 in the database but I can't see which terms these relate to.

Do you know how I can obtain the definitions matching up to the codes for the less obvious ones, I am looking for Clonal Spread (Clone1, Clone2), Origin of alien taxa (Origin), Life Form (LF1,LF2)

(avb edit: Email headers removed. Original email sent by David Roy on 29 January 2016 if you wish to locate the original email) Have you looked in the associated .pdf at http://www.brc.ac.uk/biblio/plantatt-attributes-british-and-irish-plants-spreadsheet

sacrevert commented 8 years ago

I will assume that all of the above issues are solved unless I hear differently.

andrewvanbreda commented 8 years ago

i wouldn't like to say it is all resolved, the importer is still in development. Some might have answers, but not been coded yet for instance. Don't worry about them for now, if I need anything, I will raise it in a separate thread and leave this one just to act as an archive.

sacrevert commented 8 years ago

Fine - I will continue thinking in general about user interfaces and uses anyway.

BiologicalRecordsCentre / BSBI-Card-and-PlantPortal-DEPRECATED-

Useful information from old emails #1