lsetiawan commented 7 years ago

160#issuecomment-317583449. Please edit as you see fit.

The format is waterml cv as key then list of ODM2 cv that matches as values. I am only mapping the ODM2 CV terms that don't match with WaterML 1.1 terms.

Censorcode

## Censorcode [(WaterML 1.1 CV)](http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=CensorCodeCV&id=773577794) ```yaml censorcode: gt: - Greater than lt: - Less than nc: - Not censored nd: - Non-detect pnq: - Present but not quantified ```

Datatype

## Datatype [(WaterML 1.1 CV)](http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=DataTypeCV&id=789577851) ```yaml datatype: Best Easy Systematic Estimator: - Best easy systematic estimator Constant Over Interval: - Constant over interval StandardDeviation: - Standard deviation ```

Samplemedium

## Samplemedium [(WaterML 1.1 CV)](http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=SampleMediumCV&id=821577965) ```yaml samplemedium: Not Relevant: - Not applicable Other: - Rock - Regolith - Mineral - Ice - Habitat Surface water: - Liquid aqueous - Liquid organic Suspended particulate matter: - Particulate Tissue: - Organism Tree: - Vegetation Wellhead Gas: - Gas ```

Unitstype

## Unitstype [(WaterML 1.1 CV)](http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=Units&id=1125579048)

Generalcategory

## Generalcategory [(WaterML 1.1 CV)](http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=GeneralCategoryCV&id=805577908)

Valuetype

## Valuetype [(WaterML 1.1 CV)](http://his.cuahsi.org/mastercvreg/edit_cv11.aspx?tbl=ValueTypeCV&id=1141579105)

lsetiawan commented 7 years ago

(Updated 8/29/2017 for clarity and compactness)

I am currently using the mapping in https://github.com/ODM2/WOFpy/issues/152#issuecomment-313770769.

Here's a waterml:odm2 CV mapping dictionary, with items sorted by keys in the order used in the previous comment (above). Another resource I found is here.

wmlodm2_cvmap = {
    'censorcode' : 'censorcode',
    'datatype' : 'aggregationstatistic',
    'samplemedium' : 'medium',
    'unitstype' : 'unitstype',
    'generalcategory' : 'variabletype',
    'valuetype' : 'actiontype'
}

Inventory of ODM2 CV terms that don't currently match WaterML 1.1 CV terms, listed by ODM2 CV. The set of CV's listed here is the same as in the preceding comment. The numbers following the ODM2 CV name are as follows: (Number of terms:Number of terms not matching with WaterML 1.1)

censorcode (6:6)

['Unknown', 'Present but not quantified', 'Not censored', 'Non-detect', 'Less than', 'Greater than']

aggregationstatistic (18:6)

['Standard error of the mean', 'Standard error of mean', 'Standard deviation', 'Constant over interval', 'Confidence Interval', 'Best easy systematic estimator']

medium (19:12)

['Vegetation', 'Rock', 'Regolith', 'Particulate', 'Organism', 'Not applicable', 'Mineral', 'Liquid organic', 'Liquid aqueous', 'Ice', 'Habitat', 'Gas']

actiontype (23:23)

['Cruise', 'Data retrieval', 'Derivation', 'Equipment deployment', 'Equipment maintenance', 'Equipment programming', 'Equipment retrieval', 'Estimation', 'Expedition', 'Field activity', 'Generic non-observation', 'Instrument calibration', 'Instrument deployment', 'Instrument retrieval', 'Observation', 'Simulation', 'Site visit', 'Specimen analysis', 'Specimen collection', 'Specimen fractionation', 'Specimen preparation', 'Specimen preservation', 'Submersible launch']

unitstype (179:161)

['Absorbed dose', 'Absorbed dose rate', 'Amount of Information', 'Angular acceleration', 'Angular mass', 'Angular momentum', 'Angular velocity or frequency', 'Area angle', 'Area per length', 'Area temperature', 'Area thermal expansion', 'Area time', 'Area time temperature', 'Biological activity', 'Catalytic activity', 'Concentration count per count', 'Concentration count per mass', 'Concentration count per volume', 'Concentration or density mass per volume', 'Concentration percent saturation', 'Concentration volume per volume', 'Count', 'Count per area', 'Count per length', 'Currency', 'Data rate', 'Diffusivity', 'Dose equivalent', 'Dynamic viscosity', 'Electrical capacitance', 'Electrical charge', 'Electrical charge line density', 'Electrical charge per count', 'Electrical charge per mass', 'Electrical charge volume density', 'Electrical conductance', 'Electrical conductivity', 'Electrical current', 'Electrical current density', 'Electrical current per angle', 'Electrical current per energy', 'Electrical dipole moment', 'Electrical field strength', 'Electrical flux', 'Electrical flux density', 'Electrical permittivity', 'Electrical quadrupole moment', 'Electrical resistance', 'Electrical resistivity', 'Electromotive force', 'Energy density', 'Energy flux', 'Energy per area', 'Energy per area electrical charge', 'Energy per square magnetic flux density', 'Fluid permeance', 'Fluid resistance', 'Fluidity', 'Force per length', 'Gravitational attraction', 'Heat capacity', 'Heat transfer Coefficient', 'Hyperpolarizability', 'Inductance', 'Inverse count', 'Inverse energy', 'Inverse length', 'Inverse length temperature', 'Inverse magnetic flux', 'Inverse permittivity', 'Inverse square energy', 'Inverse time temperature', 'Inverse volume', 'Jerk', 'Length energy', 'Length fraction', 'Length integrated mass concentration', 'Length mass', 'Length molar energy', 'Length per magnetic flux', 'Length temperature', 'Length temperature time', 'Level', 'Linear acceleration', 'Linear energy transfer', 'Linear momentum', 'Linear thermal expansion', 'Linear velocity', 'Luminance', 'Luminous efficacy', 'Luminous Energy', 'Luminous flux', 'Luminous intensity', 'Magnetic dipole moment', 'Magnetic field strength', 'Magnetic flux', 'Magnetic flux density', 'Magnetic flux per length', 'Magnetic permeability', 'Magnetomotive force', 'Mass count', 'Mass count temperature', 'Mass flux', 'Mass fraction', 'Mass normalized particle loading', 'Mass per area', 'Mass per electrical charge', 'Mass per length', 'Mass per time', 'Mass temperature', 'Molar angular momentum', 'Molar conductivity', 'Molar energy', 'Molar heat capacity', 'Molar mass', 'Molar volume', 'Other', 'Particle flux', 'Particle loading', 'pH', 'Polarizability', 'Potential vorticity', 'Power area', 'Power area per solid angle', 'Power per area', 'Power per area quartic temperature', 'Power per electrical charge', 'Pressure or stress', 'Pressure or stress rate', 'Quartic electrical dipole moment per cubic energy', 'Radiance', 'Radiant Intensity', 'Radioactivity per volume', 'Satellite resolution', 'Snap', 'Solid angle', 'Specific energy', 'Specific heat capacity', 'Specific heat pressure', 'Specific heat volume', 'Specific radioactivity', 'Specific surface area', 'Specific volume', 'Stable isotope delta', 'Standard gravitational parameter', 'Temperature count', 'Temperature per magnetic flux density', 'Temperature per time', 'Thermal conductivity', 'Thermal insulance', 'Thermal resistance', 'Thermal resistivity', 'Thrust to mass ratio', 'Time squared', 'Torque', 'Volume thermal expansion', 'Volumetric flow rate', 'Volumetric flux', 'Volumetric heat capacity', 'Volumetric productivity', 'Yank']

variabletype (23:15)

['Water quality', 'Volatile', 'Uranium series', 'Trace element', 'Stable isotopes', 'Speciation ratio', 'Rock mode', 'Ratio', 'Rare earth element', 'Radiogenic isotopes', 'Noble gas', 'Model data', 'Major oxide or element', 'End-Member', 'Age']

emiliom commented 7 years ago

@lsetiawan, thanks for the additional work/research, and for starting a new, narrower issue (and for being diligent and deleting the last comment you posted in #152, to put it here instead). This looks like good progress towards mapping non-matching ODM 2 terms.

I'll be seeing Anthony and Jeff tomorrow, so we can discuss these CV 1.1 vs 2 issues and mappings in person!

emiliom commented 7 years ago

@lsetiawan can you remind me where we are with these CV mappings work? I see that we merged PR #163 which lists method, source and qualitycontrollevel in its title. I also see PR #159 (merged), which was broader; and for my own reference, I'll paste here the comments I made there:

These are great steps in the right direction. I really like the caching of the latest, relevant ODM/WaterML 1.1 at the time the wofpy server is started.

What your PR doesn't address yet is the need to "curate" a mapping between ODM 2 and ODM 1.1 vocabulary terms in a way that's not just 1:1. I'm not sure how we reconcile your dynamic, automatic approach (ie, just pull in the latest vocabularies) with the need to develop mappings that are manually maintained.

Also, personally I think I would put all this vocabulary related code in a more focused module (eg, "vocabularies.py") rather than the new generic "util.py" you've created.

And of course, this discussion started at issue #152

lsetiawan commented 7 years ago

We need to work together to fill out the dropdowns on https://github.com/ODM2/WOFpy/issues/160#issue-245239137 above to match ODM2 CV to WATERML CV.

emiliom commented 7 years ago

Some comments:

The mapping proposed here (mapping ODM2 CV terms to the closest term for the corresponding WaterML 1.1 CV) has to be specific to ODM2 DAO's. It's not a generic mapping that would apply across all DAO's in WOFpy. So, it should probably be implemented at the DAO level.
A useful distinction should be made between obvious, unambiguous mappings and mappings that are not at all obvious or where there is really no appropriate correspondence. Don's explicit mappings for censorcode and datatype in the first comment on this issue are clear examples of the former. When the differences for a CV are drastic (the second case), it may be best to just let the ODM2 terms "pass through" w/o evaluation ...
Don's CV new handling module: https://github.com/ODM2/WOFpy/blob/master/wof/utils.py (as I suggested earlier, I think it should be renamed to, say, vocabularies.py)

lsetiawan commented 7 years ago

It's not a generic mapping that would apply across all DAO's in WOFpy. So, it should probably be implemented at the DAO level.

Yea this makes sense. Though your comment:

ODM2 terms "pass through" w/o evaluation ...

This is a little difficult in DAO level. In core_1_1.py checks are performed like L641. My thoughts are adding another variable to the check functions that specify the data model somehow, or just let any CV pass through by default if they are not matched in DAO.

emiliom commented 7 years ago

Let's discuss this in an hour or so, when I'm in. FYI I've done some refactoring of vocabularies.py

emiliom commented 7 years ago

@lsetiawan, will you be able to submit a PR with the remaining changes for CV handling, before you leave? We're so close!

lsetiawan commented 7 years ago

@emiliom The latest changes are now up on WOFpy dev server. Thanks.

emiliom commented 7 years ago

Whoa, that was very fast! I've started looking at it. FYI, many of the valueType values look odd (in postgresql/EnviroDIY test DB), but don't worry about it. I'll look into it.

ocefpaf commented 7 years ago

I'll be up for a few more hours. If you issue the release I can prepare the packages today.

emiliom commented 7 years ago

Thanks, @ocefpaf. But I have some questions for @lsetiawan first, and it looks like I may need one more PR.

@lsetiawan (after testing REST 1.1 dev endpoints):

The CV values are coming through as expected (for the most part) in GetVariable* and GetValues* requests, but GetSite* responses have glitches:
- GetSites has an empty seriesCatalog element. I don't remember if it's supposed to be there at all, or if it should be populated.
- GetSiteInfo responses have "Unknown" values for valueType, dataType and sampleMedium, in both databases/endpoints.
- If you point me in the right direction in the code, I can try to address this myself tonight or early tomorrow morning.
On a much less important note, I thought you had implemented parsing of the address text to extract and populate state. Did you? I'm not seeing any responses that include the state. Not a problem either way, but I need to know so that the release notes are accurate.

emiliom commented 7 years ago

GetSiteInfo responses have "Unknown" values for valueType, dataType and sampleMedium, in both databases/endpoints.

Looks to me like the Variables instantiation at odm2/timeseries/sqlalch_odm2_models.py#L111 needs to look more like the one at odm2/timeseries/odm2_timeseries_dao.py#L205 -- see the extra arguments that are passed.

If you can confirm that that looks right, I'll tackle it.

GetSites has an empty seriesCatalog element. I don't remember if it's supposed to be there at all, or if it should be populated.

I don't know about this one yet. Any thoughts?

emiliom commented 7 years ago

It also looks like a datatype match (self.get_match('datatype' ...) needs to be added in the get_series_by_sitecode* functions in odm2_timeseries_dao.py, just as it is in get_variables_from_results

emiliom commented 7 years ago

GetSites has an empty seriesCatalog element. I don't remember if it's supposed to be there at all, or if it should be populated.

This empty seriesCatalog element was already present before we made all the recent changes. It's present in the "stable" (non dev) AWS endpoints.

emiliom commented 7 years ago

Final DAO configuration for the upcoming release:

CV's for which an ODM2-WaterML1.1 mapping (examples/flask/odm2/cvmap_wml_1_1.yml) was created for non-matching terms: censorcode, datatype, samplemedium
CV's deemed to have too large a divergence between ODM2 and WaterML1.1 to either validate terms or create a mapping; ODM2 terms are used as is: unitstype, generalcategory, valuetype

emiliom commented 7 years ago

New release is out. Closing.

ODM2 / WOFpy

ODM2 to WaterML 1.1 Controlled Vocabularies Mapping. #160

152 is getting long. Opening this issue for the mapping of CV. Following up from https://github.com/ODM2/WOFpy/issues/160#issuecomment-317583449. Please edit as you see fit.