Closed andrewvanbreda closed 4 months ago
Thanks Andy - I look forward to your conclusions on your test runs.
The tests went very well. All the complex bits of the plant att import seem to work fine on my machine ....I wasn't sure as I couldn't remember as it was a while back. There are just a few simple free text fields that need importing that weren't such as comments, I now have code for this and will commit it when I know it is working. There are just a couple of things to note really. One is before you do the test import we need to adjust the original file slightly, this doesn't actually change the data, but the format of the file is slightly wrong and we need to adjust it. We can deal with that when you do the import. The other thing to note is that we may have problems if the importer is accidentally run twice, but I am not sure whether it is going to be worth coding corrections for that as we will only be running it once. The only other comment I have to make is I am importing one of the columns "BRC code" as free text, but I am not sure whether it is the intention to import this column at all?
I will let you know once I have committed the code to import the free text columns (such as the comment). This is the easiest bit of the import.
We can give this a run whenever you like now. As mentioned, we will just need to adjust the file slightly before importing to get the format right. Or I can even send you the corrected file. No rush from my end. Whenever you like. Will look at what is happening with the floristic importer now, I can't comment much as I can't remember, although I can say that code does exist for it...how complete that code is I can't remember. I will report on that in the other thread I created.
When you say "give it a run", do you mean replicate your test at our end? Presumably this can be done by me? That is to say, no specific technical expertise is required?
I suppose what I mean is you will need to make sure that it is doing what you want...that would involve running it. The only thing is we don't have a user interface to view the data with once the import is done, so I could supply some SQL to query the database with or I could whip up a report to display the results visually on the Plant Portal dev site. We could get Biren to run it if you like then we could look at the data together if you prefer.
Before we run it we do need to adjust the SQL code very slightly (as we need to give it the ID of the species list, and also the path to the original import file on your hard disc) but the actual running of it simply requires the execute button to be clicked.
So I think really the main thing is simply that the person who is going to run it has access to the database to use for the test (which I assume would be the Dev Warehouse)
OK, it sounds like perhaps I could watch Biren run it (so that I might learn something), and then inspect the results/structure of the results in the Dev Warehouse database afterwards.
Am including Biren on this thread now. Let me know when you might be available to give this a run on Dev warehouse. Keep in mind there will be preparation time, but I don't expect that to be longer than an hour
Thanks. Biren is back on to work on Tuesday, so I will ask him then.
Do we have a link to the Dev Warehouse user interface, I don't seem to be able to get it to work. I just wanted to check the UK Master List was fully available on that warehouse to import into. I assume that is the list we will use and not one created for Plant Portal specifically (the importer doesn't mind what the species list, it is just an ID number, any species it doesn't find simply won't be processed)
Sorry, don't know. @DavidRoy will, although I see no reason why it wouldn't have the UK Master List on it; certainly there is no plan to use a specific Plant Portal list.
I have a link, it just seems to have stopped working. Simply says not found. That is fine about the list, that is what I though.
@JimBacon may be the man to ask.
Ok, we'll see what Jim says. There definitely seems to be something funny going on, as I double checked the URL (by looking at the warehouse path in iForm settings and then putting index.php/login on the end), but it still isn't working as it is the same URL I thought it was. So I am out of ideas.
Hello @andrewvanbreda and @sacrevert , Our warehouse link is devwarehouse.indicia.org.uk, Also Last I cloned from on 28/04, So it is only a month old but It has got up to date UKSI same as live warehouse as I know we haven't upgraded UKSI on live warehouse.
Hello @andrewvanbreda Please change the website ID in the iForm setting on test plant portal site. After cloned the old website data have been vanished, so you might need to create new one if you haven't created on live warehouse before.
Ok thanks for the info Biren. I will take a look at this tomorrow and let you know when I am in a position when the Plant Att import can be given a run on the dev warehouse
Hi Biren,
I am now going to attempt attach the importer sql file we are going to use to this thread. Further instructions to follow. It is a SQL file, but I think I need to change the file extension to .txt otherwise it won't attach
Sorry I forgot I need to attach the data file too. Again it is .csv but with need to alter the extension to .txt for upload purposes
Okay. I think the first think to say it don't run this on the wrong warehouse as it will run. I suspect it probably won't do anything if run on the wrong warehouse as there isn't a Plant Portal website on live, but best not try. So we are going to run this on Dev Warehouse. Although the SQL code is split into about 10 files when I committed it, I have supplied it to you in 1 file. I think this might be better for you as if your machine behaves like mine, an error will cause a complete abort and rollback if you run it all at once (rather than a half broken import). Having said that, I have successfully run it on my box just now, but you never know.
So all you need to do is save the data file to the disc and change its extension to .sql
Then in the SQL code change the path which is indicate by the following text in the file
I have already put the Master List ID 15 into the file.
I think that is it. Run that against the DEV Warehouse. I suppose the only think I would say is maybe double check someone isn't running something critical on dev before you run it if there is someone else using it
Hello @andrewvanbreda
The data from the CSV file has been imported to dev warehouse. It has taken 03:47 minutes.
Biren and Oli have done an import of the data into the dev warehouse this afternoon. The data hasn't been examined yet (Oli is going away so that might be after he is back). However it sounds like it was probably been successful. Biren noticed one issue where one row hadn't been copied in from the file. I think this is probably not a code issue, but an oddity with the original file that needs correcting (as the point that failed was just where I give a copy command so it is not something I deal with, I just tell the system to do it)
The 3:57 run time is roughly what I would expect. It is slower than my machine, but your species list is more complicated, the only species I had in my list were the ones I had copied in from the file. It is good that it isn't vastly longer as that was a danger I suppose
Hi Biren, I think I got confused and am not seeing the row issue you got during import on my own box. I am getting an import of 1887 rows (there are 1888 but one is a header). You are reporting 1886 import with row 1871 missing. I cannot find a reason for that because I can't see that happening. Can I confirm where the issue is occurring. Is the following query returning 1886, select count(*) from plant_portal_importer.tbl_plant_att;
Thanks
yes, that is right.
Interesting....I am not sure what we can do about that because I literally just tell the system to read in the file. It is one of those ones I could spend hours on just for one line, and I don't think that is a good use of financial resource. So let's try a reload and hope it works. We will just be populating the table that holds the file data, we won't rerun the Indicia data structures import code. Basically we simply re-run some of the sql, I will include the drop and count in the sql and will attach the sql in a second for you to use Before you run it though, can you try this, copy the data from the file into a new file using copy paste. Then save and use that file for the import, if you use a different filename you will need to change the path in the sql.
So to be clear the steps you need to take are:
Hi Biren,
I started getting your problem again, I thought I got it last night but then didn't earlier this afternoon. I just tried my plan I outlined above and it didn't fix it so there is no point in you doing it. However I have an idea, I will try that and get back to you.
Hi Biren,
That didn't work. I tried splitting up the file. I can't work out why I didn't get issue earlier. How did you identify which row was the culprit?
Ah!!! I have worked it out!!! I just tried to rerun some sql where is it working OK and I realised it is very slightly different. At the end of the sql file that is "failing" there is this extra line --Manual corrections to the data as discussed with David Roy. delete from plant_portal_importer.tbl_plant_att where taxon_name = 'Zostera angustifolia';
So we are correctly not including one of the rows! I can't actually remember what was said in my discussion with David about this, but that is what my comment says 1886 is correct
@andrewvanbreda As far as I can see the test import appears to be fine. Happy for this to go ahead on live whenever it is appropriate to do so.
@BirenRathod just wait before running this on live as I will have a quick think to make sure I haven't forgotten anything.
No need to rush. Whenever you ready.
Just to fill in some information on Github, Oli noticed there was a problem with the TVKs in the original file so sent me a new file. I have run this successfully on my machine. I have performed a few tests and the data looks ok (which is expected as the code hasn't changed). Only a few tests are required as it has already passed testing so we can assume because a few tests are passing then all with pass now, at this stage if it was going to break because of the new file I would of expected it to break completely due to a file format issue or something like that. I will attach some files in a second. One is the new import file I used successfully, the other is a sql file for the TEST WAREHOUSE. I assume we will do a run on the test warehouse before live, note this SQL file SHOULD NOT be used on live (the data file doesn't matter)
Important Notes: @BirenRathod the importer SQL has a different species list id for the test warehouse, it seems to be 34 for the master list. This means that file cannot be run on live.
@ do you just want to double-check I uploaded the right file with the right TVKs? it should be right.
I suggested just holding off on the run for a bit to give us some thinking time, that approach seemed to pay benefit last time as you realised about the TVK, particularly if there is no super rush
@BirenRathod , perhaps it would be a good idea that you double check the test warehouse species list id is indeed 34. I think I am right, but double check before any run. As I suggestion perhaps we should wait a little while before making the run, at least a few days anyway
Hi @sacrevert. Just a thought on this, do you think it is worth making use of the source_id on the attributes in the database to indicate they are from Plant Portal Plant Att. I know in the Pantheon project we used sources. If not I think we should start having a think about maybe getting round to running this on the test warehouse and then live. I say this because we would want to run it whilst it is still fresh in my head if there are any issues, also we want to run it while we know the warehouse is in a state where we know the importer works (the warehouse might change over time with updates)
@andrewvanbreda Yes, I think that is worthwhile. What type of field is source_id? Is it just an integer that points to a website list, or is it a text field where we add a scientific reference? Happy for these Plantatt data to be loaded to test again and then live when this is done.
It is an ID that will point to a term in the sources term list.
That term could be simply called "Plant Att" or be more complex.
Hope that wasn't too confusing, I think what I am trying to say is that we are so flexible now whatever you want we can almost certainly make happen.
As we are going to rerun the import on the test warehouse before live (as the tvks changed) I think we should go ahead with that run after I have added the sources as planned. Not forgetting that adding the source doesn't really add too much complexity so a check of the sources + any tests we were already going to do should suffice on the test warehouse
OK, can the term be "PLANTATT" (all caps, no space).
Year = 2004 Authors = Hill, M.O., Preston, C.D. & Roy, D.B. Link = http://brc.ac.uk/sites/www.brc.ac.uk/files/biblio/PLANTATT_19_Nov_08.zip
I'm assuming that it is worth storing the link as it could then be displayed on a webpage that was reporting plant portal results. Very happy for @BirenRathod to run the data on the test warehouse again whenever you are ready.
That should be no problem. Will let you know once done. We can store the link, worst comes to the worst we just get rid of the link if we find it isn't useful or doesn't work.
Just to make sure, do I have run on test warehouse or dev warehouse?
Hi @BirenRathod Am just chatting to Oli and David about this on email, when a run is required I will give exact instructions on what needs to be done so don't worry. For now don't run anything until I get back to you.
@andrewvanbreda , Ahh, that's good. Clear my confusion. looking forward to.
@sacrevert Before I commit can I confirm this format is OK for the term "PLANTATT - Hill, M.O., Preston, C.D. & Roy, D.B. (2004)" I put the authors after a hyphen and the year in brackets
Now we are going for a full citation in one field, can you use: "Hill, M.O., Preston, C.D., & Roy, D.B. (2004). PLANTATT- attributes of British and Irish plants. NERC Centre for Ecology & Hydrology: Monks Wood."
Yes, I have changed that. I will let you know once it is committed
BTW, in the link field I won't be putting the HTML, just the link address, this makes more sense having thought about it. Any code using the link can then do what it likes with it, including inserting its own HTML
Nested in #46
I thought I would create a thread for the Plant Att import as it has been committed for quite a while now and it has gone silent. As I mentioned it has been committed for quite a while now so should be good for a test run at BRC. I am just double checking I am happy with it. Let me know if you want to do a test run import and I can instruct you on running the SQL. As I mention as it has been a long time I am just taking a look at it again now to refresh my memory, I will put a note in this thread to say when I am happy or if there are any outstanding questions. I will create a separate thread for the NVC floristic data