BiologicalRecordsCentre / BSBI-Card-and-PlantPortal-DEPRECATED-

A portal to promote plant recording and analysis of plant data
0 stars 0 forks source link

BRC import file HLS_test_data_HL10_v2.csv - test results #68

Closed andrewvanbreda closed 7 years ago

andrewvanbreda commented 7 years ago

Hi @sacrevert ,

I tested you import file, I think the results are a mixed bag really, could be better, could be worse.

Firstly I didn't have those species on my machine, unfortunately on my development warehouse the importer isn't working at the minute, so I changed your species in the file into ones I knew I had, this shouldn't affect the validity of the test though

Firstly the good. When I initially ran the importer it did correctly warn me with issues with the file. I didn't analyse every single aspect about what it should of warned me about, but it did seem to correctly complain about:

Minor problems first

  1. The first error screen that showed the warning above did have some layout problems. This isn't the same warning screen with the nice tables on. I don't remember it being like this before though. It never looked at good as the tables screen, but it wasn't this ugly, it might of been related to the sheer amount of information it was trying to show me.

  2. For the stratum column I realised I would have to stop the system hiding it even after I created the attribute, this is because it automatically hides attributes that we don't want. I tried to quickly do this, however as it is an occurrence attribute it wasn't a super quick job...am sure it wouldn't take too long, but not the few minutes I tried to do it in. I don't really foresee any major issue with stratum although I couldn't test it

  3. There is no Cover attribute at the minute, but once Strata works, this should be easy to add too.

More serious issues

  1. You used a "NT94675,14360" spatial reference. I don't think this is valid (I assume it isn't valid to include a comma). The importer actually thought this was two columns which it shouldn't of done as it was in quotes. I am surprised it did this as it is based on the indicia importer which I assume works for that scenery, needs fixing anyway.

  2. I changed the import file in order to overcome some of these issues, to get to the point of having done an actual import. The first import I did said it couldn't find the location 9437. That probably means I mapped the wrong column, but it shouldn't of done that anyway, as should create the plot

  3. On a second run of the import it complained about a sample/occurrence error. I have never seen that error during the development of this project before. I am confident that has only occurred because of a the tangle of situations we just had as have definitely seen it import some data as recently as yesterday.

  4. One final note, more of a question @sacrevert You mentioned you were concerned about grasses and mosses earlier. At the moment it treats these like an occurrence. So you will need to give more detail on your concern so I can provide any information you need.

So my next plan of action is run the importer with another simpler importer file and make sure I can overcome all of the above issues, as I say some of the issues I have seen working in the past (numbers 5 and 6, and 1 to some extent) Once I have it working on a very simple file with cover and stratum also, then I can try and rerun the file

Nested in #46

sacrevert commented 7 years ago

I will respond to your points in individual comments below.

sacrevert commented 7 years ago

Some rows wth same sample id are split up in the file so they aren't on consecutive rows

I did that on purpose, thinking what the average user might do. Now I am thinking whether the checks need to be more intelligent. For example, here, the occurrences with the same sample id that are not on consective rows are actually for different plots, so, should the importer realise that they cannot actually be the same sample?

Does the current logic mean that a user can use the same sample IDs for different plots, but only if they are uploaded in separate tranches?

It also found rows with the same sample id but different spatial references

I suppose this is the same issue as above. The warning is correct, but if the importer checked the plot it would know that they were not the same sample (and that that the different spatial refs were allowed).

If you can answer the question above, then I suppose I can explain this in the guidance (particularly if fixing this logic in the importer is a biggish job).

AVB Response: Hi Oli, What you are describing is what the standard Indicia importer used to do. However, (I thought) the point of the Unique Spatial Reference ID was to have an exact way of determining the sample the different occurrences would go into. So the above behaviour is very deliberate. The user would know exactly that rows with the same sample ID would go into that sample and there wouldn't be any complications of sometimes it would do this, sometimes it would do that, based on a series of rules. It would simple be based on the Unique Sample ID (and obviously the rows would have to be have consistent dates etc). They need to be on consecutive rows just to make the programming easier. The idea was that the user would know exactly that the occurrences would going into the sample with that ID, unless there was an inconsistency flagged. Once the importer is done, the Unique Sample ID wouldn't really be used. Let me know if I have that totally wrong

OLP response: Just to avoid this thread getting too complicated, I'm going to move this to a new issue to separate out the discussion, and to make it easier to find in the future (even though it may not require any programming). New issue #71

sacrevert commented 7 years ago

Minor problems first

The first error screen that showed the warning above did have some layout problems. This isn't the same warning screen with the nice tables on. I don't remember it being like this before though. It never looked at good as the tables screen, but it wasn't this ugly, it might of been related to the sheer amount of information it was trying to show me.

Leave for the moment?

AVB Response: Yes I will leave unless it is a 5 minute job. I thought it looked better before, so the fix might be very very simple. If not will leave

AVB Note: Also make sure screens remain well presented if there is a lot of data or warnings to show. Low priority. On hold for now awaiting possible further funding

AVB Note: Now marked as On Hold in issue #59, so can consider this complete for this current thread

sacrevert commented 7 years ago

For the stratum column I realised I would have to stop the system hiding it even after I created the attribute, this is because it automatically hides attributes that we don't want. I tried to quickly do this, however as it is an occurrence attribute it wasn't a super quick job...am sure it wouldn't take too long, but not the few minutes I tried to do it in. I don't really foresee any major issue with stratum although I couldn't test it

There is no Cover attribute at the minute, but once Strata works, this should be easy to add too.

Fine, these are both essential for it to be really useful. Cover is something that has always been there, but which we have probably never worried about given that it is again just a simple occurrence attribute.

AVB Response: I don't expect any real trouble from these once I have persuaded the mappings screen to show me the attributes. Should be available next build.

AVB Response: Stratum and cover both now committed. As anticipated this didn't take long (maybe only 50 minutes). The stratum wasn't working because of a typo, the cover was easy to add after that

Issues now covered at #62 and #69

sacrevert commented 7 years ago
  1. You used a "NT94675,14360" spatial reference. I don't think this is valid (I assume it isn't valid to include a comma). The importer actually thought this was two columns which it shouldn't of done as it was in quotes. I am surprised it did this as it is based on the indicia importer which I assume works for that scenery, needs fixing anyway.

No, it is not valid, but 'tis something people might throw at the importer. If this requires a general Indicia github issue raising/some other fix, then can I assume that you will mention this to John and he will fix it under his general iRecord contract?

AVB Response : Will raise separately from Plant Portal if needed, but actually having said that I think it is a Plant Portal issue, as the warnings screen it is displaying on is a Plant Portal one. I don't think it would be a hard fix though

==DR== I would not see this as a priorty to resolve. We can give guidance about how to form spatial locations in a 'sensible' way

OP: Yes, leave this. Cover in guidance documentation.

AVB Response: OK, am leaving this. I will let you decide if you wish to open an issue for it

sacrevert commented 7 years ago

I changed the import file in order to overcome some of these issues, to get to the point of having done an actual import. The first import I did said it couldn't find the location 9437. That probably means I mapped the wrong column, but it shouldn't of done that anyway, as should create the plot

OK, that should just have been a sample ID. As you say, if you mapped it to Unique plot ID, presumably it should have treated it as a plot name.

AVB Response: Yes I think I mapped in correctly it, but I would of expected it to create a plot. It didn't complain yesterday when I was importing data so it much of been creating plots (else it would have said it didn't find them yesterday). I suspect this is something caused by this tangled web of scenarios into today's import rather than a general problem of it not working at all. Will investigate

sacrevert commented 7 years ago

One final note, more of a question @sacrevert You mentioned you were concerned about grasses and mosses earlier. At the moment it treats these like an occurrence. So you will need to give more detail on your concern so I can provide any information you need.

I suppose this just relates to what the importer does if it cannot map a taxon. Nothing more than that really. For something like 'Grass', the user would need to change it to something recognisable by the taxon dictionary (e.g. Poaceae). I think I will just provide advice on this in the guidance document, as it is probably not sensible to provide lists of rules mapping every possible vague term thrown at the importer into a valid taxon category. Ultimately that is the user's responsibility.

AVB Response: Ah OK, yes I see what you mean, unless "Grass" was a specific common name it wouldn't find it. But I suppose at this point, shouldn't they be using at least common names? I have never tried importing something that is further up than the species level either. Perhaps if I raise this separately as any issues wouldn't get fixed for this build anyway.

==DR response== If there are commonly used names in plot data we can add them into the dictionary. presumably an argument for having - 'Grasses', 'Lichens', 'Bare ground'

OP: No immediate action from AvB required. OP to provide terms for dictionary we he has looked through datasets thoroughly and decided which are necessary. New issue here: #70

sacrevert commented 7 years ago

@andrewvanbreda OK, I think I've answered all of these. Do you want to comment within the comments where necessary, linking to new issues if that is what is required.

I'm a bit concerned that this is going to start ballooning. But I suppose we can reevaluate after you looking through my responses. I will need to sit down with David soon and think about a new funding strategy for this work.

AVB Response: Yes I would agree we need to evaluate what is happening. My To Do list is a bit out of control as well at the moment as I used to just log them, but now there is Github too. I think it would be a good idea if I just spend a couple of hours trying to fix some of these things, then evaluate the system after that. I say that as I might be able to get rid of a lot of the issues quickly...and if I can't, we should evaluate the situation anyway. I think you'd feel better once things like Stratum and Cover are done and done quickly hopefully...less to think about. I think also if I can get in a situation where we can close a few of the threads that would be good also.

I don't think the importer as such as out of control, as I say yesterday I was successfully importing some test data with it. However there are requirements flying in at the moment, so we need to make sure those are organised.

andrewvanbreda commented 7 years ago

AVB note: Just checked the thread as it is a bit tangled. The only action I need to take at the moment is: Clear the Plant Portal imported data on machine and then... a) Rerun the importer to double check plots are being created correctly as we have an instance of it not doing so. b) See if I can work out what happened with the occurrence/sample error after import.

These are the only two jobs from the particular thread to be complete before a new build

sacrevert commented 7 years ago

@andrewvanbreda Fine. I'm just thinking about the 'Unique sample ID' issue that I haven't responded to above (now moved to #71 )

andrewvanbreda commented 7 years ago

AVB: At least some of the issues (if not all) have been caused by the code in Github becoming corrupted. It definitely looks like the plot creation code is actually fine if it hadn't tangled by what I think is an unfortunate sequence of events (it is complaining about a missing report) I think the sequence of events is as follows: I merge my code into Develop branch as Biren's needs that to update the warehouse. John's needs me to undo those changes so he can do a release to iRecord. Github then overwrites the code in my branch at a subsequent date when I try to keep the branch up to date with the latest changes (which is normal practice, but we wouldn't normally have undos coming across)

So.....this is both good and bad. Am sure I can retrieve the code, and it should work once done. How quickly I can untangle this I do not know yet, hopefully quickly. That explains the importer issues that I thought I had seen working fine before and suddenly stopped working.

andrewvanbreda commented 7 years ago

@sacrevert I have now got this working on my machine. It was the problem with the code corruption caused both of the above issues so I didn't need to do any new fixes. Thankfully I had an older Plant Portal branch before the one I made for Biren that became corrupted and have successfully merged the two on my machine (although I think going forward I will commit a fresh new one)

Using the version of the import file which I changed quite a lot to overcome some of the original issues (such as the comma in the grid references) I have just done what looks to be a successfully run. There are occurrence attribute values, I can see stratum imported. Not sure about the cover, doesn't seem to be there, but that might be because I might of forgotten to map it. There are sample attributes such the the country and vice county. There is a plot group, and a new plot linked to it, and also the person doing the importer has been linked to the plot group, I can see that all in the database.

So what I will do now is go back to the original file you gave me (HLS_test_data_HL10_v2.csv), simplify it as little as possible (some changes to the species names and the removal of the spatial reference commas will still be required) and we will see what happens.

It that is successfully I will start aiming to close this thread

andrewvanbreda commented 7 years ago

I have just done a run of the original HLS_test_data_HL10_v2.csv file using the latest code on my machine. I have made the following alterations to the file but these alterations are not considered a failure of the test:

During the test the system correctly reported problems with the file such as dates between samples being inconsistent, samples pointing to multiple plots. In fact I left in some of the British National Grid references with comma's by accident and these were picked up as invalid correctly so perhaps this got fixed as a result of something else (not confirmed). I was able to follow the warnings provided by the importer and make corrections and this resulted in an import taking place without error (although I haven't examined the result in the database yet)

So this is somewhat of a milestone in that I was able to apply corrections correctly and import after this with nothing unexpected happening, it warned about all the correct things.

The only real oddity is there are 3 different screens where the user can receive warnings, but I don't think we can do anything about that, some are general importer warnings, some are plant portal specific, and some warnings can only occur once we have checked with the warehouse.

As I say this test is a visual pass, but doesn't include any examination of the state of the database

sacrevert commented 7 years ago

@DavidRoy @andrewvanbreda Sounds like great news. Thanks Andy : )

andrewvanbreda commented 7 years ago

@sacrevert I am closing this particular thread now. It think everything has been fixed or raised in another thread. I think eventually we would do a run of the import file again on the dev site with no alterations (for instance I changed the species for my machine). But we can open a new thread for that if needed