DistanceDevelopment / distance-bugs

A place to keep bugs in Distance
http://distancesampling.org/Distance
1 stars 0 forks source link

STRATIFICATION command documentation (MCDS documentation) #160

Open dill opened 8 years ago

dill commented 8 years ago

I can't find documentation for the STRATIFICATION command in the Distance 7.0A manual. This command appears in the OPTIONS section of the MCDS input file.

erex commented 8 years ago

No luck when I consult the D6.2R1 manual. Chapter on MCDS Engine reference shows this table as list of valid commands in the options section

capture

erex commented 8 years ago

Here is MCDS command file for plain old geographic stratification applied to the minke whale dataset. No use of a stratification option.

-- Start of Analyis Engine Log File -- 
 This is mcds.exe version 6.2.0     
 Options;                                                                      
 Type=Line;                                                                    
 Length /Measure='Nautical mile';                                              
 Distance=Perp /Measure='Nautical mile';                                       
 Area /Units='Square nautical mile';                                           
 Object=Single;                                                                
 SF=1;                                                                         
 Selection=Specify;                                                            
 Confidence=95;                                                                
 Print=Selection;                                                              
 End;                                                                          
 Data /Structure=Flat;                                                         
 Fields=STR_LABEL, STR_AREA, SMP_LABEL, SMP_EFFORT, DISTANCE;                  
 Infile=C:\Users\erexstad\AppData\Local\Temp\dst762A.tmp /NoEcho;              
 Data will be input from file - [...]APPDATA\LOCAL\TEMP\DST762A.TMP
 End;                                                                          
 Dataset has been stored.
 Estimate;                                                                     
 Distance /Nclass=7 /Width=1.5 /Left=0;                                        
 Density=All;                                                                  
 Density=Stratum /Design=Strata /Weight=Area;                                  
 Encounter=Stratum;                                                            
 Detection=Stratum;                                                            
 Size=Stratum;                                                                 
 Estimator /Key=HA /Adjust=CO /NAP=0;                                          
 Monotone=Strict;                                                              
 Pick=AIC;                                                                     
 GOF;                                                                          
 Cluster /Bias=GXLOG;                                                          
 VarN=Empirical;                                                               
 End;      
dill commented 8 years ago

That's all I could find in the manual too...

For example, I see the following for Post-stratified E(s)_strat f(0)_regr analysis in D70Cluster solutions:

> model_definitions[[3]]
 [1] "Engine=CDS;"
 [2] "Options;"
 [3] "Stratification=Post-stratify /LayerType=30 /FieldName=Cluster strat;"
 [4] "Sample /LayerType=20;"
 [5] "Selection=Specify;"
 [6] "Confidence=95;"
 [7] "Print=Selection;"
 [8] "End;"
 [9] "Data /Structure=Flat;"
[10] "End;"
[11] "Estimate;"
[12] "Distance;"
[13] "Density by All;"
[14] "Density by Stratum /Design=Strata /Weight=None;"
[15] "Encounter by Stratum;"
[16] "Detection by Stratum;"
[17] "Size by Stratum;"
[18] "Estimator /Key=HA /Adjust=CO /NAP=0;"
[19] "Monotone=Strict;"
[20] "Pick=AIC;"
[21] "GOF;"
[22] "Cluster /Bias=GXLOG;"
[23] "VarN=Empirical;"
[24] "End;"

So I'm not sure how this interacts with the rest of DISTANCE or what the possible options for this command are...

LHMarshall commented 8 years ago

I will look into these commands, I can easily generate all the options from a tester project

LHMarshall commented 8 years ago

hi @dill here are a few screen shots of the MCDS distance interface which might make things a little clearer. This is a tester project which means that the model definition that is stored in the Distance database is displayed on the right.

screen shot 2015-12-18 at 13 47 27

screen shot 2015-12-18 at 13 40 43

screen shot 2015-12-18 at 13 39 15

screen shot 2015-12-18 at 13 37 08

LHMarshall commented 8 years ago

@dill I think we were chatting about how the estimates were combined the other day too... there are options on this as well:

screen shot 2015-12-18 at 13 49 32 screen shot 2015-12-18 at 13 51 17 screen shot 2015-12-18 at 13 52 55

LHMarshall commented 8 years ago

Digging around in the NEngineInterface (the part of Distance that takes the model definition and turns it in to the code that actually does the analysis) I found the following:

'\ Commands not to write out Case "Engine", "Stratification", "Sample", _ "DataSelection", "CovariateData"

These commands all appear to be used in the setting up of the data rather than in mcds.exe This is where I find a mention of post-stratification: Private Function MakeDataQuery( _ udtCovariateList As D5TypeCovariateList, _ ByVal strObsLayerName As String) As Boolean 'Purpose: Makes a query of the form "qry" & mlngID & "_Data", which contains ' the data for this analysis. Returns true if successful, false ' otherwise. 'First thing to do is to do the data selection, yielding queries on the ' data tables that are used in all subsiquent operations 'Then, for almost all cases, making the recordset involves making the SQL string from ' its clauses by calling the functions GetSQL...Clause, and then simply ' opening it up. 'The complication comes with Post-stratification, where the post-stratum field is ' in the observation layer. This is complex becuase there we want the dataset ' to contain at least one sample at each level of the post-stratum, even when there ' are no observations for that sample. ...
These options are used in the following way to construct the dataset: 'Purpose: Makes up the qry[ID]_Str stratum query used by the analysis. ' This query is an inner join of all stratum layers (inner because ' data selection may have excluded some records and you don't want these ' included). ' The only fields in the query are: ' STR_LINK - the IDs of the records in the lowest stratum or substratum layer ' STR_LABEL (in most cases, see below): ' (1) If udtStratType = None then just "1." ' (2) If udtStratType = Stratify then contains a combination of the ' ID field and the label field from the udtStratLayerType layer. ' (3) If udtStratType = Post-stratify then ' (a) if the post stratify layer is a stratum layer this is set to the ' appropriate field in that layer ' (b) if the post stratify field is in another layer (e.g., sample or ' observation), this is not saved ' STR_AREA - ' (1) If udtStratType = None then is total stratum area ' (2) If udtStratType = Stratify then contains the area of the corresponding ' STR_LABEL stratum. How this is calculated depends on what's in the ' mstrSurveyDefinition string's Area command. If the Area's /LayerType ' is the same as the udtStratLayerType its pretty simple. If, however ' the /LayerType is lower, you need to sum up the areas using a ' separate query before returning them. ' (3) If udtStratType = Post-stratify and the udtStratLayerType is one ' of the stratum layer types, this contains the area of each of these ' post strata. Calculation, like that of (2) depends on the Area ' command. Otherwise, just contans the total, like (1) ' Covariates - If mblnIdMCDS and udtCovariateList.ListCount > 0 then ' looks for covariates in each new stratum layer -- if it doesn't find them ' then exits with an error. 'Note: you can assume obslayername exists.