cccs-web / soc-maps

Web mapping application in support of social analysis.
6 stars 5 forks source link

determining appropriate means for data management #11

Closed cccs-ip closed 9 years ago

cccs-ip commented 9 years ago

The following discussion links to a number of previous issues,. Most recently, this discussion of file management in S3.

It's beginning to appear to me that there are two different issues going on in terms of store and use of the shapefile data:

1) Storage and retrevial of shapefiles for file sharing (and collaborative editing of shapefile data)

Kartoza has developed a script to upload data to QGIS. This script currently calls up shapefiles located within /abadi/map-app/source_materials/shapefiles/ , as stored (and orgnaized) within the map application website. The SQL script that they are using load files into a postgress database, which is referenced by our mapping application.

Using the Kartoza script relies on users having access to data that is organized and stored in a particular way. It was my understanding that you had recommended S3 for storage of shapefiles for this purposes (such as opening a 'public' data repository that people could access to load data into an initial local deployment of the web map). It is beginning to sound as if this route of file sharing management may not work. Alternative methods proposed by Kartoza are btsync and GeoGig. Of the two, I am better disposed to GeoGig.

2) Storage and retrevial of data for use with mapping applications

The Kartoza SQL script loads shapefile data into postgresSQL. After that point, it appears that all shapefile data is stored (and modified?) in postgres (??). It is unclear to me how one could / should be sending changes to particular shapefiles from postgres back our 'data shares'. It is beginning to look as if the solution for us to export shapefiles that we modified in postgres periodically, and to manually keep the 'sharing repository' up to data. This approach would only be necessary, however, in cases where we plan to regularly share data (such as sending files to clients or sharing with the public). I am not entirely clear about what procedure we'd use to keep track of when or why files have changes with S3 (which is why it is beginning to look like S3 might be the wrong choice for managing shapefile data); in any case, we'd want to keep track of dates that changes were made and notes on what those changes were so that we can present edited data back to clients.

NyakudyaA commented 9 years ago

The loading script should ideally be run once to load the data into the database and once the data is in the database, the loading scripts will be of no use. The data in postgres will be version controlled by geogig once we have it running. People should edit data directly from the postgis store and push to geogig which version controls the data. Geogig manages conflix well the same way that git does. Any new data that comes along can be either loaded directly into postgis or geogig and then moved vice versa. We have also shared two database dumps for private and public using btsync for people who do not have access to the data.

cccs-ip commented 9 years ago

Thanks for this, Admire.

Paul and I were looking at S3 as a low-cost, fast-response storage option. It appears, however, that S3 may not the best means for storing files shapefiles that would be used by qgis.

I have asked Paul to review your work in establishing the postgres database from our current shapefile source materials to ensure that there are no conflicts with our current data naming schema. I haven't yet figured out how to get into docker to export a psql dump, but I believe Paul is making headway.

We can close this issue soon. It'll be great to get the GeoGig database set up. Thanks for helping us with that.

NyakudyaA commented 9 years ago

Doing a database dump from a docker container is done the same way as on a normal install. for instance pg_dump -Fc gis -U docker -p 6001 -h localhost >database.dump this would be for the public one. and once we have geogig we can move data between geogig and docker. I will document how to get database dump for either private and public on the wiki.

cccs-ip commented 9 years ago

Excellent, Admire - this is very helpful. Thank you.

cccs-ip commented 9 years ago

Following our conversation, we've agreed to an architecture comprised of the following:

A discussion of our data management concept can be found on the wiki: https://github.com/cccs-web/soc-maps/wiki/data-management-concept-and-context