BiologicalRecordsCentre / BSBI-Card-and-PlantPortal-DEPRECATED-

A portal to promote plant recording and analysis of plant data
0 stars 0 forks source link

NVC floristic import observations #56

Closed andrewvanbreda closed 5 months ago

andrewvanbreda commented 7 years ago

Nested in #46

Again similarly to the thread I just opened for the Plant Att import, I am opening a thread for the floristic import. I haven't taken a look at it yet to refresh my memory but there looks to be some committed code for it. I am less sure that this is complete though in comparison to the Plant Att importer. Will take a look and let you know if it is ready for a run, however my guess is there are some outstanding questions for it

andrewvanbreda commented 7 years ago

@sacrevert OK thanks for letting me know, I was aware of that danger and deliberately tried to avoid something like that. Maybe it was working on the other columns too when I didn't realise it. We need another run anyway as we also have the preferred species issue with the NVC code.

andrewvanbreda commented 7 years ago

@sacrevert Good spot, interestingly though, that issue doesn't look like being caused by an NA replacement as the data file looks fine. I will just have to check the code to see if there is any NA replacement handling in their that I didn't realise.

Just let me know if you spot anything else. Keep in mind that I have tested on my machine also, so you may wish to take that in't account when deciding how my to test.

I will let you know when I am in a position to instruct Biren for a rerun of any bits that need it

sacrevert commented 7 years ago

@andrewvanbreda I have done a bit more work on the community level attributes, and can confirm that they are loaded, but have not worked out how to link the values to actual community names just yet. It might be easier if you can provide some examples; I've quite a bit of other work on at the mo, and although I'm sure I will work it out, some example tests would save some time. Cheers

andrewvanbreda commented 7 years ago

@sacrevert OK no problem, I will get that to you. Prob not today, but probably by tomorrow

sacrevert commented 7 years ago

@andrewvanbreda OK, I did actually get my head back around this, and it seems OK. The following gave the answers that I would expect given the raw data.

select * from indicia.termlists_term_attribute_values ttlav join indicia.termlists_term_attributes ttla on ttla.id = ttlav.termlists_term_attribute_id AND ttla.deleted=false join indicia.termlists_terms tt on tt.id = ttlav.termlists_term_id and tt.deleted=false join indicia.terms t on t.id = tt.term_id limit 100;

sacrevert commented 7 years ago

@andrewvanbreda One tiny thing, could you capitalise 'tree' in "tree/shrub height" for consistency (not your fault, this was in the original spreadsheet)? Cheers.

andrewvanbreda commented 7 years ago

@sacrevert Yes I will change tree. I will make a note of it, might be safer to just change in the warehouse after importing, I will look at the importer and decide if there is any risk to changing importer

andrewvanbreda commented 7 years ago

Hi @sacrevert,

At the moment the situation is

  1. I have corrected the preferred species situation and committed.
  2. The tree/shrub height issue is just the original data file, we should simply be able to correct the name in the upload file before uploading
  3. The problem with the "na" missing in the community name was a problem with the replacements in v2 of the file (which is the one which was uploaded). However since then the latest file is "NVCtables_v3_avb_formatted.txt" which has never had that issue so should be fine
  4. I have run on my machine and cannot see any reason whey the species constancy value did not work. Admittedly the data type on the attribute is wrong, and should be "L" (lookup) instead of "T" (text). However this doesn't actually seem to cause any practical problem. So I don't know the issue, perhaps that part of the SQL wasn't run, I do not know. I will commit a commit to fix the data type, but otherwise I think we shouldn't let this stop us doing the next run (possibly test warehouse)

So I suppose my only query now before recommending another run on this would be to work out if we want a source like we have for Plant Att and Pantheon For Plant Att the attributes have the following source for comparison (although you don't have to use the same layout format for NVC)

Term: "PLANTATT - attributes of British and Irish plants."

PLANTATT source link: "http://brc.ac.uk/sites/www.brc.ac.uk/files/biblio/PLANTATT_19_Nov_08.zip"

PLANTATT source references: "Hill, M.O., Preston, C.D., & Roy, D.B. (2004). NERC Centre for Ecology & Hydrology: Monks Wood."

sacrevert commented 7 years ago

@andrewvanbreda OK fine, I will continue working on the TVK issue then.

The source terms should be: Term: "British Plant Communities (NVC floristic tables)"

NVC source link: "http://jncc.defra.gov.uk/page-4265"

NVC source reference: "Rodwell, J.S. (Ed.). (1991). British Plant Communities (5 Volumes). Cambridge University Press, Cambridge, UK."

andrewvanbreda commented 7 years ago

@sacrevert OK, great I will add that too the code and then we can look to do another run of this next week

andrewvanbreda commented 7 years ago

Source added to the taxa_taxon_list_attributes along with associated link/reference. Committed.

Note that the source_id doesn't appear on the termlists terms attributes table, so I will need to amend that table separately sometime. So the source is not on the communities attribute at the moment.

We can amend that simply in the db if needed after import, as it is just one id that needs adding.

andrewvanbreda commented 7 years ago

termlists_term attributes now has source_id column. Will probably update this field manually after import though, as the field might not be present when the importer is run as it is currently on the development branch code

sacrevert commented 7 years ago

As per https://github.com/BiologicalRecordsCentre/BSBI-Card-and-PlantPortal/issues/55#issuecomment-344329033 but for NVC data NVC_DDBparsed_v5.zip

andrewvanbreda commented 7 years ago

@sacrevert OK thanks. Am just about to look at the Plant Att situation this afternoon....need to refresh memory, will let you know my thoughts on NVC too as soon as I can

andrewvanbreda commented 7 years ago

@sacrevert OK thanks, I think the code has changed with this importer since last run (unlike PlantAtt which hadn't changed), and we also have a new importer file. However a major advantage with this importer is we haven't done a test warehouse run yet, so we can do a clean run on that. So I suggest roughly the same course of action as PlantAtt, but with a test warehouse run also. I think the tests after that run need to be a bit more detailed than PlantAtt as PlantAtt the code looks fine, but with NVC we noticed an issue where the Species Constancy Value wasn't imported during the dev warehouse run (although I couldn't find a reason why that happened). My suggested steps are,

  1. I will reformat the new file so that it will work with the importer
  2. Make sure any previous manual changes to the importer file are still present.
  3. Do a quick run on my own machine with some simple tests
  4. Test warehouse run, with particular attention for issues noticed during dev warehouse run
  5. Run on live if fine and verify it worked.

Again will let you know how I get on.

sacrevert commented 7 years ago

Great. Hopefully the 'Species' column in this new file should map exactly onto the species names in the full set of NVC table information, so hopefully the new TVKs will slot right in.