BiologicalRecordsCentre / sparta

Species Presence/Absence R Trends Analyses
http://biologicalrecordscentre.github.io/sparta/index.html
MIT License
21 stars 24 forks source link

Failure with more than one region missing #198

Open mlogie opened 3 years ago

mlogie commented 3 years ago

When I run occDetFunc for species using region_codes, I get some failures. This appears to happen for species which have data missing from more than one region. I attach a failures.txt (really a csv) which shows how the species I'm running and how many data points are in each region and whether they failed or not.

I receive the error message: <simpleError in seq.default(mindat, maxdat, 1): 'from' must be a finite number>

And the following warning: Warning message in min(current_r): “no non-missing arguments to min; returning Inf” Warning message in max(current_r): “no non-missing arguments to max; returning -Inf”

With @AugustT suggesting this line may be the culprit: https://github.com/AugustT/sparta/blob/64f1cf1b168138d00ed388319ff9101ce037c303/R/occDetFunc.r#L452

failures.txt

drnickisaac commented 3 years ago

Thanks @mlogie: this is helpful. Could you clarify two things: 1) are the numbers in "failures" the number of records or the number of sites for each species? 2) is this the data going in to occDetFunc, or is it on the BUGS data object coming out? (the point here is that occDet removes sites with data in only one year).

drnickisaac commented 3 years ago

I agree with you both that the error is occurring somewhere near L452, but I can't see exactly where. I think the solution is to run through each line of occDetFunc() with a real dataset, and check the behaviour of zero_sites when its length is >1.

AugustT commented 3 years ago

Yeah, that's the way to go. Mark I assume you have an example species/dataset we can use for debugging? can you put a date in our diaries for us to do this, give us an hour. It would be good to show you how I would go about debugging this.

Tom


From: Nick Isaac notifications@github.com Sent: 16 November 2020 13:36 To: BiologicalRecordsCentre/sparta sparta@noreply.github.com Cc: August, Tom tomaug@ceh.ac.uk; Mention mention@noreply.github.com Subject: Re: [BiologicalRecordsCentre/sparta] Failure with more than one region missing (#198)

I agree with you both that the error is occurring somewhere near L452, but I can't see exactly where. I think the solution is to run through each line of occDetFunc() with a real dataset, and check the behaviour of zero_sites when its length is >1.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/BiologicalRecordsCentre/sparta/issues/198#issuecomment-728026184, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA6NQ3DNAFCLQEDBDPQIP4TSQETGJANCNFSM4TUT4I5A.

This email and any attachments are intended solely for the named recipients and are confidential. If you are not the intended recipient please reply to the email to highlight the error and delete this email from your system; you must not use, disclose, copy or distribute this email or any of its attachments. UKCEH has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKCEH does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UKCEH business are solely those of the author and do not represent the views of UKCEH. We process your personal data in accordance with our Privacy Notice, available on the UKCEH website. https://www.ceh.ac.uk/privacy-notice

mlogie commented 3 years ago

@AugustT @drnickisaac quick update here - I've slowly slowly been debugging this. Took a while. There's two things going on, and it's a bit different from what I originally thought:

First, data is dropped which don't meet the 'at least two visits' threshold (as mentioned above). This actually had the effect of dropping every single observation for Northern Ireland. And that was fine. The code handled that well. It said: "oh, you have no sites in Northern Ireland, so I'll drop that region and all aggregates with Northern Ireland in it". So far, so good. Though I would suggest that the warning message is misleading - it says you have no data for this region, when actually you have no data post-this 'at least two visits' cleaning step.

However, there is no step in the code (at least that I can see or test) which checks if the focal species has any records in a given region. Because of this, when it starts looping through regions, it tries to get min year and max year for an empty dataset, returns -Inf and +Inf and this results in an error later.

I've written a fix... but I'm rather surprised this hasn't come up before. Surely data sets have been modelled before for regions for which there was no data for some species?

mlogie commented 3 years ago

Btw, I have tested my edits and it fixes the problem

AugustT commented 3 years ago

Thanks Mark. When you put in your pull request can you add a test that check the functionality? Also does the bug you found explain why you only saw the error for species missing data for 2 regions?

mlogie commented 3 years ago

It does. Funnily enough, I was actually wrong about the pattern. The ones that failed were those which had data missing in a single region except North Ireland. The function handled the lack of data in Northern Ireland by virtue of the fact that NI was dropped before trying to run the model because after the minimum of 2 visits step, there was no data for the entire taxonomic group. It just so happened that the one species which had data in northern ireland and not in one of the other regions had data missing from Wales and Scotland, making me think it was the 2+ regions issue.

drnickisaac commented 3 years ago

@mlogie I have just made some changes that might fix the problem. Would you run your tests again with v0.2.09 ?