bsed / ala

Automatically exported from code.google.com/p/ala
0 stars 0 forks source link

Occurrence records with invalid layer sampling? #659

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Summaries of occurrence queries from Kristen Williams are suggesting (serious?) 
anomalies in the sampling of spatial layers of some records during processing. 
For example, if a query is constructed on 

Country = n/a AND
IBRA = n/a AND
IMCRA = n/a AND
el884 (bathymetry and elevation) = n/a

it should only return records that occur outside Australia. The following query 
however

http://biocache.ala.org.au/occurrences/search?fq=-ibra%3A*&fq=-imcra%3A*&fq=-cou
ntry%3A*&fq=-el848%3A*&wkt=POLYGON((96.6%20-58.0,159.8%20-58.0,159.8%20-10.5,96.
6%20-10.5,96.6%20-58.0))#tab_mapView 

produces 40,954 records, most of which ARE in 'Australia' and are terrestrial. 

A few records are marine (correctly or not as I haven't checked yet) but should 
still be on the bathy/topo layer (and IMCRA). 

There are also a few on offshore islands, which should also be on the 
bathy/topo layer, if not IMBRA or IMCRA - and they should also be 'Australia' 
but that may be another issue. 

The WKT (bounding box) here just includes Australia and associated islands. 

PS: Diagnosis of classification of the records would be made easier if #658 
were solved.

Original issue reported on code.google.com by leebel...@gmail.com on 2 May 2014 at 5:17

GoogleCodeExporter commented 9 years ago
Miles any thoughts on this ? Looks like sampling hasnt been ran or there are 
some problems with certain datasets.

Original comment by moyesyside on 6 May 2014 at 6:02

GoogleCodeExporter commented 9 years ago
looks like it's just subsets of records that haven't sampled properly - I 
re-sampled the climatewatch records and they dropped out of the query.  I'll 
sample/process/index the others and see if there's anything left.

Original comment by milo_nic...@hotmail.com on 6 May 2014 at 6:25

GoogleCodeExporter commented 9 years ago
I've sorted out the records that can be easily re-sampled, unless this is 
urgent I'd prefer to wait for updates to the remaining large data sets - it 
doesn't seem worth reprocessing an entire states data set for a few hundred 
records?

Original comment by milo_nic...@hotmail.com on 6 May 2014 at 6:45

GoogleCodeExporter commented 9 years ago

Original comment by moyesyside on 6 May 2014 at 6:50

GoogleCodeExporter commented 9 years ago
Great that this is acknowledged and will be addressed. I do think it may be 
more than a few hundred records though. I would think it is well into the 
thousands?

Original comment by leebel...@gmail.com on 6 May 2014 at 7:04

GoogleCodeExporter commented 9 years ago
full resample and re-index done. this now affects 4 records as opposed to 40k. 
Downgrading but keeping open for those 4.

Original comment by moyesyside on 30 May 2014 at 5:46