Closed snubian closed 7 years ago
Some field names changed around April, suspect this was one of them. Looking at ala_fields("occurrence_indexed")
I see a field called outlier_layer
(whose description is "Outlier for layer"). Is this it?
The corresponding column name in the occurrence download data is outlierLayer
.
We do prettify the column names in ALA4R, which was originally to try and make them consistent across functions. With the new field name changes we'll need to look at that at make sure we're still doing sensible things. Backwards compatibility might become an issue here, but we'll see how we go I guess.
Thanks @raymondben - I saw the outlier_layer
column, but the data in it is stuff like el845
or whatever, so I think it means that the data for those environmental layers are considered outliers(?). Previously the field was a TRUE/FALSE indicating a suspected spatial outlier.
I suspect it's simply gone, will dig around a bit more. Thanks again for your quick response, and once again for your efforts with this fantastic package.
The contents of the el845
etc environmental fields are populated from gridded environmental data, so if the position of the observation is a spatial outlier then those values will be outliers with respect to the norm for that species. So I'm guessing that previously the "outlier" status was calculated on the basis of environmental layers and now it is just giving more info about which layers indicate its outlieriness ... but I'm only guessing. @nickdos @adam-collins - can you enlighten us?
When we introduced our "offline" downloads, we changed the way the download file was generated but tried to keep most fields the same. The old download format is still available but is limited to 100,000 records and the new newer offline has no limit (for now). E.g. http://biocache.ala.org.au/ws/occurrences/download?q=genus:Macropus
(e.g. without the /offline
part).
The difference is the old download is produced directly from the SOLR index, whereas the new download is produced from the Cassandra database directly (SOLR index is a subset of data in Cassandra). The outlier_layer
is only in the SOLR index I think, so we either need to calculate that value on the fly for the offline download, etc.
For now, the SOLR download is still available, so I think ALA4R could provide an option to use the older SOLR download (100,000 max) or the newer offline download. The SOLR download will be quicker and may suite some users better but it makes the API more complicated, trying to explain the existence of 2 similar but slightly different APIs.
Another work around () would be to use the web interface to build 2 queries, one with records where detectedOutlier
is true and another where it is false. Then trigger 2 downloads and then merge them after manually setting the values for detectedOutlier
... if that makes sense.
ALA4R does both the indexed (SOLR) and offline methods. The outlierforLayers column seems to be returned in both.
I think our main question here is whether that field is equivalent to the old Suspected.outlier
or detectedOutlier
field.
Thanks once again @nickdos! And @raymondben I may have the answer to that question. Just looking at a recent download, the outlierForLayer
field has data like el882
, el865
etc, which refer to bioclimatic variables such as temperature, precipitation, etc. This is also the same as Outlier for layer
and Outlier layer count
filters on the web interface.
A download from some months ago includes both the Outlier.for.layer
field (with data as above) and the Suspected.outlier
field, which to my understanding is a TRUE/FALSE indicator of spatial outliers. So they seem to be different fields, yes. It would be great to know if a) the spatial outlier field still exists, and b) if it can be gotten at.
EDIT: I've tried Nick's suggestion for the old download format, it doesn't have suspectedOutlier
but does have the Outlier.for.layer
though it's a 0/1 field.
As usual, many thanks to everyone, and any assistance is greatly appreciated :)
P.S. I should add, I'm not 100% sure that the old suspectedOutlier
field was actually what I think it was!
I've noticed that a field I've used in the past does not seem to be included in occurrence downloads anymore. I believe the field was called
Suspected.outlier
but might've beendetectedOutlier
inALA4R::occurrences()
. Downloads I did usinghttp://biocache.ala.org.au/ws/occurrences/index/download
back in July 2016 included this field, and I feel like I've seen it recently, but it's no longer included when I run the same download, or when usingoccurrences()
.Do you guys keep track of these sorts of things? I figured you might need to know as you seem to prettify the naming of the fields.
Thanks!