ODM2 / CZ-Manager

CZ Manager (formerly ODM2 Admin) is an application for site level data management of environmental observations using Observation Data Model 2 (ODM2) for documentation and a detailed walkthrough see:
http://odm2.github.io/CZ-Manager
MIT License
9 stars 12 forks source link

Comments & questions on Luquillo CZO database used by ODM2-Admin #15

Open emiliom opened 8 years ago

emiliom commented 8 years ago

@miguelcleon, I'm dumping a long list of my comments and questions in this single issue. I had already typed this in a local markdown document, so it's easiest for me to just dump it all in one place. We can split it off later. I'm pinging @lsetiawan (Don), so he's in the discussion.

When trying to restore the database backup file (using PgAdmin) on my laptop (Ubuntu 14.04, Postgresql 9.3, PostGIS 2.1), the database was installed with a couple of severe gaps. The problem boiled down to PostGIS failing to install, which in turn prevented the samplingfeatures table from installing.

PostGIS installation

actiontypecv count
Data retrieval 1
Equipment maintenance 1
Estimation 1
Instrument calibration 1
Observation 12
Specimen analysis 13
lsetiawan commented 8 years ago

@emiliom One thing I noticed about the topology and tiger extensions is that they are part of the blank ODM2 PostgreSQL

emiliom commented 8 years ago

One thing I noticed about the topology and tiger extensions is that they are part of the blank ODM2 PostgreSQL

That's an important observation, @lsetiawan. I need to look into that and work on eliminating it there.

miguelcleon commented 8 years ago

yup, I have my time series data in the measurement results and measurement results values tables instead of the time series results and times series result values tables. That is a problem I'll need to address and is systematic issue with ODM2 Admin.

miguelcleon commented 8 years ago

sampling features 767,768, 769, 648,649,650 are 'part of' 656. They are all separate soil pits dug at a single larger site named TABOX-12.

Sounds like maybe your suggesting using the sites table for something like this?

miguelcleon commented 8 years ago

I've fixed the two records with the swapped lat-long, thank you.

emiliom commented 8 years ago

sampling features 767,768, 769, 648,649,650 are 'part of' 656. They are all separate soil pits dug at a single larger site named TABOX-12.

Thanks for that clarification. After exploring that particular relationship, I have one observation: relatedfeatures has many duplicated records. For example, the "Is part of" relationship between sampling features 767 and relatedfeatureid 656 occurs 6 times. The records are identical, except of course for the sequential, unique primary key, relationid. I'm guessing that's a messiness that was introduced inadvertently and should be cleaned up.

emiliom commented 8 years ago

This isn't a comment or question. I'm putting here this summary of samplingfeaturetypecv usage in the samplingfeatures table for future reference.

samplingfeaturetypecv count
Excavation 324
Field area 13
Landscape classification 8
Observation well 3
Site 119
Stream gage 2
Transect 5

The table was generated using this query:

select samplingfeaturetypecv, count(*)
from odm2.samplingfeatures
group by samplingfeaturetypecv order by samplingfeaturetypecv

Update: I guess I do have one comment. I've compared these samplingfeaturetypecv occurrences to the corresponding ODM2 controlled vocabulary entries. The only that doesn't match exactly is "Landscape classification"; the official ODM2 entry is "Ecological land classification". I can see why your choice seems more natural. And it's possible (I don't know) that the ODM2 CV actually was edited, and the "Ecological land classification" entry was previously called "Landscape classification". But in general it's helpful to stick to strict compliance with ODM CV's, unless you have strong reasons not to. Also, the use of this type of Sampling Features could probably be enhanced, and we (ODM2 gang) could benefit from a discussion about a common implementation need like yours.

For reference, these are the Landscape Classification entries in use in the database: Colorado Forest, Palm Forest, Tabonuco Forest, Volcaniclastic, Quartz-diorite, Ridge -Topo Postion, Slope -Topo Postion, Valley -Topo Postion.

emiliom commented 8 years ago

Another comment about this:

sampling features 767,768, 769, 648,649,650 are 'part of' 656. They are all separate soil pits dug at a single larger site named TABOX-12.

All those sampling features are classified as "Excavation". But TABOX-12 (656) is in fact something conceptually different. It's not an Excavation per se, but rather a "Site" or "Field area" where all those individual excavations (soil pits) are found. At least that's what I'm assuming.

miguelcleon commented 8 years ago

Yeah, I haven't stayed consistent with classifying the sampling features and I'll need to double check them. Thanks for all your great feedback Emilio, this is really helpful! I'll plan to update you with a new db dump in probably about 2 weeks time does that sound good?

I'll also work on some application updates also particularly for the needed measurement result to time series result conversion that is needed.

emiliom commented 8 years ago

Thanks for all your great feedback Emilio, this is really helpful!

I'm very glad it's helpful to you.

I'll plan to update you with a new db dump in probably about 2 weeks time does that sound good?

That time frame sounds good. We still have tons to learn, both about your ODM2 implementation and ODM2-Admin per se.

miguelcleon commented 8 years ago

I thought I'd address one more of the issues you brought up for now. the table odm2.Measurementresultvaluefile can be dropped I had initially added these extra tables in odm2 schema and knowing that wasn't the best way to do things, I since moved them into odm2extra schema. I'm only using the odm2extra.Measurementresultvaluefile table now.

This should also probably be done for the django generated tables you mentioned, this may or may not be doable though as they are core aspects of the functioning of django and using multiple schema's the way I have with odm2extra doesn't seem well supported, so I have been hesitant to try to change those tables.

I should also move the custom function MeasurementResultValsToResultsCountvalue.

miguelcleon commented 8 years ago

All those sampling features are classified as "Excavation". But TABOX-12 (656) is in fact something conceptually different. It's not an Excavation per se, but rather a "Site" or "Field area" where all those individual excavations (soil pits) are found. At least that's what I'm assuming.

I changed sampling features with samplingfeatureid's 654 to 677 sampling feature type to 'sites'

emiliom commented 8 years ago

Thanks for the info on odm2.Measurementresultvaluefile vs odm2extra. I'm glad it was just a leftover table, no longer used.

Good to know about the django tables, too. If you'd like, maybe in a couple of weeks (when it's a good time for you) we can bring in Jeff Horsburgh and his team to see if they can help with Django insight. They're using Django heavily in web applications for ODM2, and they probably use credentials too. Maybe they've dealt with this schema customization issue.

lsetiawan commented 8 years ago

@miguelcleon One question about the tables in odm2extra schema. Are they created by django after defining them in models? I tried doing syncdb with django=1.6.5 and the tables get created, but then using migrate with django=1.9.x doesn't create those tables, have you run into those issues? or did you create the tables separately with SQL statements? Thanks.

miguelcleon commented 8 years ago

Yeah that is the iffy support for multiple schemas right there. I just used some extra side SQL to create the tables for django 1.9

lsetiawan commented 8 years ago

Thanks @miguelcleon. While learning django. I've gained more understanding about the models and settings for ODM2-Admin. I've tweaked the model.py to hard coded odm2 schema for each db_table variables in class Meta similar to what you did for the odm2extra.

I then created another schema called admin. I also set admin schema as the default for django. When I ran python manage.py migrate it seems like the django core tables are created under the admin schema, leaving the odm2 and odm2extra schemas alone.

So from this, I think that in order to integrate multiple schemas, one have to designate the schema in db_table variables. I tried running django 1.9.x and it seems to work, though, this method doesn't seem to work with Django 1.6.5. Thanks.

miguelcleon commented 8 years ago

Cool! Can you do a pull request into the Django1.9.7Support branch?

lsetiawan commented 8 years ago

@miguelcleon let me clean up my repo git stuff. I didn't notice the second branch. Please ignore the current pull request. Thanks.

emiliom commented 8 years ago

Regarding the schema issues:Thanks, @lsetiawan! That's great.

@miguelcleon, on your README page it says "support tested for django 1.6.5 and 1.9.x". I have no direct experience with django development. But given that so far you're the only user of ODM2-Admin, wouldn't it be easier if you limited backwards compatibility and supported only Django 1.9.x? It sounds like that would minimize development headaches. Just my 1 cent.

miguelcleon commented 8 years ago

Yeah I'm hoping to drop support for 1.6.x soon, can't upgrade a server but we are transitioning to a different one which will be on django 1.9.x

miguelcleon commented 7 years ago

Hi @emiliom the last commit I pushed today should address the issue

result types

based on record counts in results tables, only two result types are being used: >measurement and profile. But a count of results.resulttypecv entries yields two different result types: >"Measurement" (12,192) and "Time series coverage" (199).

results table record counts: measurementresult (205) and profileresult (12,113). No >records in timeseriesresult These inconsistencies **should** create problems, in principle

I converted my measurement result, and measurement result values which were really time series into time series result and time series result values. I did this for both the live LCZO and TRACE database. I'm creating back ups now and then will apply these changes on the CUAHSI instances of ODM2 Admin.

emiliom commented 7 years ago

Thanks for the heads-up, @miguelcleon. I'm still trying to digest the email exchange about time-series-like measurements, and think about the pros and cons of measurements vs time series result types in this context. I'll try to have some input tomorrow.

Regardless, consistency is good, and your change does bring about consistency.