BiologicalRecordsCentre / plantportal

Focused repo for the Plant Portal website
0 stars 0 forks source link

NULL entries in location_sref but not entered_sref #70

Closed sacrevert closed 1 year ago

sacrevert commented 1 year ago

I don't know why this should be, but in the TTI download there are occasional instances where location_sref is NULL but entered_sref is not. Should this really happen? What does it mean and can it be avoided? It seems to me that given this makes no sense and is inconsistent across samples, it should not really be possible.

andrewvanbreda commented 1 year ago

Hi @sacrevert Um....sounds like it could be cache_builder on Warehouse issue, we did have something similar to this recently. Before getting John involved, could you let me know if the location_id field is filled in for an example sample. Or let me know an example sample id.

sacrevert commented 1 year ago

In the TTI data sample id 11963707 is one with the issue, 11962643 one without.

If it was just a blip in something running, then that's fine. I just wanted to make sure it wasn't a systematic issue.

andrewvanbreda commented 1 year ago

Hi @sacrevert,

I was just looking at your email, and actually noticed the same issue that is happening here, the location (plot) field has not been filled in. This should be correctable as the spatial reference is present. I will have to investigate why this is happening. 15607983 is another with the issue

andrewvanbreda commented 1 year ago

@sacrevert As discussed, I will not investigate this specifically for TTI. However, what I will do is make sure Standard Mode doesn't have this issue when I get to that bit, then if I identify the problem there, I will apply the same fix to TTI. So please leave open for now.

andrewvanbreda commented 1 year ago

AVB: Note to myself. These fields are mandatory so not sure how this can happen. Also because the plot is missing (location_id) this stops the samples from showing on the My Visits grid, so the problem can only be occurring adding data.

andrewvanbreda commented 1 year ago

Hi @BirenRathod,

Could you send me these results please, thanks

select id, created_by_id, created_on from indicia.cache_samples_functional where location_id IS NULL AND survey_id = 582 order by created_on;

select id, created_by_id, created_on from indicia.samples where location_id IS NULL AND survey_id = 582 AND deleted=false order by created_on;

select id, created_by_id, created_on from indicia.samples where location_id IS NOT NULL AND survey_id = 582 AND deleted=false order by created_on;

BirenRathod commented 1 year ago

@andrewvanbreda I have attached all results. Plantportal.zip

andrewvanbreda commented 1 year ago

@BirenRathod Thanks that is great :)

andrewvanbreda commented 1 year ago

@sacrevert I have looked into this (as it is in the interest of standard mode's stability that I understand this problem).

My conclusions are

  1. This was caused by an old problem that I think was noticed in NPMS, and the fix made its way to Plant Portal. There were originally problems relating to whether the plot was mandatory. It always is now, and so this failure cannot occur anymore, the plot must be filled in (In fact we can see the problem last happened September 2021)

  2. A lot of the data faulty data is training. I deleted 14996939 as that was my test data.

  3. All of the real data was entered by one particular person. Those samples are: 11963707 11963713 11963714 11963717 11963730 11984751 15607919 15607983 15607990 15607996 15608019 I hoped to fix these manually, but the problem is it isn't obvious from the spatial reference what plot they should be part of.....at least not without investigating properly.

So a going to leave it there. You can either try to make the corrections yourself, or just close this issue.

sacrevert commented 1 year ago

Perfect, thanks @andrewvanbreda

andrewvanbreda commented 1 year ago

@sacrevert Is this one that can be closed too?