Original Issue -
https://code.google.com/p/alageospatialportal/issues/detail?id=304
Project Member Reported by leebel...@gmail.com, Nov 22, 2010
Code and documentation received from Glenn Manion. Slot in integration with the
Spatial Portal
Mar 17, 2011 Project Member #1 leebel...@gmail.com
Upping to HIGH as we need something basic running for meeting with Simon,
Glenn, Kristen probably during the week of April 4.
Labels: -Priority-Medium Priority-Critical May 12, 2011 Project Member #2 leebel...@gmail.com
Lowering priority till UI done.
Cc: leebel...@gmail.com Labels: -Priority-Critical Priority-High May 24, 2011 Project Member #3 leebel...@gmail.com
Started
Status: Started Jan 2, 2012 Project Member #4 leebel...@gmail.com
Now to critical due to NPEI project that must be completed by June 30.
Labels: -Priority-High Priority-Critical Feb 2, 2012 Project Member #5 leebel...@gmail.com
Ran all of Acacia on Tasmanian extent with equal site weight and first 5 test
layers and didn't perceive that I got any GDM output other than the input-
http://spatial-dev.ala.org.au/output/gdm/1328233963170/
[DIR] Parent Directory -
[ ] domain.grd 03-Feb-2012 12:52 311
[ ] domain.gri 03-Feb-2012 12:52 0
[TXT] gdm_params.txt 03-Feb-2012 12:52 673
[TXT] species_points.csv 03-Feb-2012 12:52 33K
Feb 6, 2012 #6 ajay.ranipeta
There was some code reverted. Process should now generate the species file with
Longitude, Latitude, Species_name.
Currently working on generating graphs and including them in the metadata file.
Feb 8, 2012 Project Member #7 leebel...@gmail.com
1. Need to swap over to import/paste assemblage code (which is seems to tap
anyway)
2. Keeps cycling back to step 2 after mapping assemblage. Doesn't get to ask
about environmental layers
3. Restrict mapped species to an area. Seems redundant as the area is defined
in step 1? (So the answer is always "yes".
Mar 1, 2012 #8 ajay.ranipeta
latest code from Glenn now generates a segmentation fault.
metadata and charts should however be done (hopefully) now.
Mar 13, 2012 #9 ajay.ranipeta
In test.
Status: InTest Mar 14, 2012 Project Member #10 leebel...@gmail.com
Looking good Ajay! A few comments
1. After designating layers should "working..." be replaced by a progress bar?
2. GDM options
Generate quantile from: Data [this is a statement, not an
option?]
Use geographic distance as additional predictor: yes no
Use all site pairs [this is also a statement as there is no option?]
3a. Naming prediction layers should default to something like "My GDM
prediction"
3b. List of environmental layers is blank even though I selected the best 5. I
can't get past Step 4
Status: Started Mar 14, 2012 #11 ajay.ranipeta
1. Leaving it as "Processing... ", I think. There is really one step which
generates the domain grid and figures out the site pairs and should give you
the option. I could randomly set it to, say a minute but it should take less
than a minute to process
2.
a) it was meant to be an option, but now a default is set as recommended by
Simon/Kristen. Now I've left it there for more as an information for the end
users
b) this is fine.
c) No, not a statement. You should be able to uncheck the box which gives you
more parameters to play with.
3a/b. Yea, so step 1 didn't really work, which might have not really finished
off the whole process. This default layer name should come up as "My GDM".
testing and fixing the GDM issue now.
Mar 14, 2012 #12 ajay.ranipeta
somehow whoever updated the GDM code on dev didn't grab the latest from SVN.
Have done that now and should all be fine.
test again Lee.
Status: InTest Mar 14, 2012 Project Member #13 leebel...@gmail.com
Thanks Ajay: A lot better.
1. The html looks good, but suggest the file be called gdm.html
2. ala.properties should be gdmparameters.txt. Not sure how you want to
differentiate this from gdm_params.txt
3. We need the transform grids as layers for further analysis (hover, sampling,
scatterplot, classification, prediction). These I guess will just be scaled 0-1
or 0-100. At the moment, there is nothing mapped from the run.
4. We need a Readme.txt file to describe all the files in the zip, as ever.
5. Would be good to substitute scientific name for species code in output
(e.g., species frequency table)
6. Name for 'prediction layer' requested but not used?
Status: Started Mar 14, 2012 #14 ajay.ranipeta
1. ok
2. no, i generate the ala.properties so i can keep track of something to help
generate the html page.
3. umm..
4. waiting for a final confirmation from Kristen/Simon/Glenn to get back to us
about the current implementation and if there are any changes and any final
file generations.
5. I'll need to generate a csv file that provides an index for species code to
scientific name/lsid
6. huh? the file prompted for the download has the layer name set.
Mar 27, 2012 #15 ajay.ranipeta
updated dev to include the output transformed grids as layer on SP. This will
them to be available for:
- hover tool
- sampling
- other analysis tools
Mar 27, 2012 #16 ajay.ranipeta
(No comment was entered for this change.)
Status: InTest Apr 1, 2012 Project Member #17 leebel...@gmail.com
No output layers mapped at the moment.
Status: Started Apr 3, 2012 #18 ajay.ranipeta
(No comment was entered for this change.)
Status: InTest Apr 3, 2012 Project Member #19 leebel...@gmail.com
If Acacia + Eucalyptus are used for Tasmanian extent, GDM step 1 reports 0
records per cell. It used to report a more realistic range.
Status: Started Apr 12, 2012 Project Member #20 leebel...@gmail.com
Thanks Ajay. A lot better. The output transformed layers are all however called
"Transformed null"
Apr 12, 2012 #21 ajay.ranipeta
(No comment was entered for this change.)
Status: InTest Apr 12, 2012 Project Member #22 leebel...@gmail.com
Looks great! I'll get Kristen to have a play now.
Apr 19, 2012 Project Member #23 leebel...@gmail.com
Kristen (April 19): Summary – issues with running more than default number of
site pairs, classification of transformed grids didn’t work, additional
outputs needed in zip file, HTML file needs a bit more work.
1. “records per cell” – this may be explained when we have the help files
(when we write them) but intuitively “records” to me means the number of
species by locations within a grid cell (I’m assuming this is a 1km grid
cell). I think it would be more transparent to label this “taxa per cell”
(do we mean species? are these matched-species, is there an option to choose
matched species? it is important to know what the taxonomic unit is for GDM).
I’m assuming the table represents the number of taxa per cell, rather than
the number of records per cell. We had this conversation before and checked
with Glenn. Even if Glenn uses the label “records per cell” and assuming
the data is actually “taxa per cell” we should present the label as “taxa
per cell”.
2. “Select a threshold to help generate the site pairs” this should be
changed to something like: “Select the minimum number of taxa in a single
grid-cell representing an assemblage to include”. The choice of threshold is
designed to improve the quality of the data toward a “presence-absence”
sample by removing grid cell (sites) from consideration, not so much to help
with generating the site-pairs, but does reduce the number of sites considered
in generating site-pairs. I used a threshold of ‘8’.
3. The bar for using all site pairs should also show the % site pairs (if easy
to calculate on the fly)
4. The button for “use all site pairs” – should say “choose the number
of site pairs to use” or “use default number of site pairs”. I entered a
number but it is not clear what would happen if I switched the button on or
left it off. Are the button and bar interchangeable – one or the other, does
one supersede the other? The default appears to be 1% (what was the rationale
for the default?). The default should probably be set around 1 million site
pairs or the number of available site pairs whichever is the lowest. For my
random example the total site pairs is 4096000. The default is 40960. This
wouldn’t be a big enough sample to model, so I chose 1045070. My analysis is
for Corymbia with the 5 standard predictors. I choose weight by number of
species.
5. Processing: after a short time a message comes back saying the server is
temporarily out of services, and stops the analysis although one has to
physically close the window. Seems to say something about a bad gateway…(is
this a problem at my end related to my network and internet settings or
software – I’m using Google Chrome - or at the ALA server end?)
6. I then had to start again. Perhaps the prelim analysis could be kept and
become a set of assemblage points in then I could start again where I left off
and try a different number of site pairs? This time I try 539391 site-pairs,
and note that the bar is not present until I switch off the “default”…
again the server out of service message came up… and the result below appeared
7. GDM processing can take a while, it might be better if it went into the
background on the server and returned the user to the ALA interface and then
produced an email or pop-up when the processing is complete?
8. The user may undertake several iterations with GDM in order to find the best
set of variables to include in the model. Typically, I iterate in using the
fitting function of the model and when I’m satisfied I produce the
transformed grids.
9. I try again, this time using the default number of site pairs….this time
it worked…and I have a look through the outputs, the transformed grid are
also available for further analysis (or modelling)
10. I now create a classification based on my transformed predictors using 20
groups…but I received a failed message – in working through the steps of
the classification, on the last page, the layer set is not listed? Is this an
indicator of why the classification is not working with the transformed grids?
I tried twice and the classification step failed each time…
HTTP Status 404 - /webportal//error/HTTP_NOT_FOUND.html.var
Apr 19, 2012 #24 ajay.ranipeta
Kristen (April 19): Metadata info
Comments on HTML report:
Under “your options” at the top of the HTML file need to include additional
information about the parameters used in the analysis, please include:
- Assemblage: (e.g. Corymbia) (plus include the records for the
assemblage in the download; add the assemblage to the map in the spatial
portal) I was able to create my own download but have no idea what the
taxonomic unit in the GDM analysis is. Nor do I know what the list of species
aggregated by grid cell is?
- Number of unique taxa: # (include a list of taxa that can be related
to the “code” in the species_points.csv file – we’ve talked through
this before, the need to be able to identify the taxa used in the analysis –
exactly what were these.)
- Taxa resolution: subtaxa (i.e., is the cut-point of the taxa set at
matched species or are all levels of taxa included?)
- Grid-cell resolution: 0.01
- Minimum number of taxa per grid cell: # (e.g. 8) (This is described
as the cut point in GDM_parameters.txt)
- Number of grid cells with taxa included in the model: #
- Total number of site-pairs: # (e.g., 4096000)
- Number of site-pairs used in the analysis: # (e.g. 40960)
- Number of predictors used in the analysis: # (e.g. 5)
- Number of I-splines per predictor: 3 (this is a default that is hard
wired into this version of GDM)
Create a new section: “Model Summary” (this can be drawn from the file
gdm_parameters.txt)
- Intercept=0.612189
- Null Deviance=70125.728738
- GDM Deviance=54905.896804
- Deviance Explained=21.703635
- All Coefficients Summed=12.193876
Charts and text:
- The cut-point.csv table could be presented first along with
explanatory text so that users understand how to apply this parameter.
- Observed versus predicted compositional dissimilarity (raw data
plot): x-axis should not be labeled after a value of 1.0 (predicted values do
not extend this far) and red line should end at a value of 1.0
- “site pairs” or “site-pairs” inconsistent – I would prefer
we used hyphenated “site-pairs”
- Observed compositional dissimilarity vs predicted ecological
distance (link function applied to the raw data plot): “The line represents
the perfect 1:1 fit.” Not red curve and change to “The red curve represents
the perfect 1:1 fit.”
- Instead of “The scatter of points signifies noise in the
relationship between the response and predictor variables.” Say “The
scatter of points signifies residual variation and noise in the relationship
between the response and predictor variables.” Not all of the scatter will be
noise, some may be systematically correlated with variables not included in the
model.
- The plots of each predictor variable are not correctly linked into
the HTML file…and links to the full plots as well as thumbnails would be
handy – same as for maxent, from memory I think you can open the full plots
by clicking on the thumb nails
- Data list – need to include a list at the end of the HTML file
describing each of the datasets provided in the zip file and what they mean. If
Ajay could start with a list, I could draft the commentary and Glenn could
check this is right (see attached spreadsheet for starters). I guess this is
the objective of the “readme.html” which is presently blank.
- Need lookup table relating “species code” to the actual taxon
name used in the GDM analysis that would enable these to be matched to an ALA
identifier in the data downloaded for an assemblage
- Need a lookup table relating “EnvGrid#” to the layer name. This
can be inferred from the gdm_params.txt
May 27, 2012 Project Member #25 leebel...@gmail.com
Kristen: I think it would be worth making step 4 GDM Options that produces the
summary of species available with an Analysis ID. If the analysis fails (which
it almost always does for me because I pick too large an extent or too many
site pairs), one has to go back to scratch. If the entries can be saved at Step
4, then one can make a small modification to the # site pairs and try again?
Better still, run the GDM as a background job…so that most analyses produce a
result.
Lee: Ajay is currently working on background processing (#722).
Status: Started Aug 21, 2012 Project Member #26 moyesyside
Lee - to my knowledge this is all done and in production. Is this correct?
Owner: leebel...@gmail.com Cc: -adam_col...@tpg.com.au -leebel...@gmail.com Aug 21, 2012 Project Member #27 leebel...@gmail.com
No, GDM isn't complete. Kristen and I still have to discuss what may be
required to at least tidy GDM up. Kristen has her head down on NPEI and
probably a heap of other work so I'm leaving it till she pops her head up.
Needless to say, I'm busier than I'd like to be at this stage as well.
Cc: Kristen....@csiro.au moyesyside Labels: -Priority-Critical Priority-High Jul 8, 2013 Project Member #28 leebel...@gmail.com
Lowering (Medium) until we can get some of Kristen's time. Issues:
1. GDM help review/edit
2. GDM case study
Updates to GDM will need to wait for higher priority issues to be addressed.
Owner: Kristen....@csiro.au Cc: -Kristen....@csiro.au leebel...@gmail.com Labels: -Type-Enhancement -Priority-High Type-Task Priority-Medium
Original issue reported on code.google.com by moyesyside on 8 Aug 2013 at 12:07
Original issue reported on code.google.com by
moyesyside
on 8 Aug 2013 at 12:07