datameet-pune / datameet-pune.github.io

Common repo and documentation space for DataMeet Pune chapter
https://sites.google.com/view/datameetpune/home
GNU General Public License v3.0
16 stars 20 forks source link

Gathering tasks for COEP hackathon, 2 Feb 2019 #20

Open answerquest opened 5 years ago

answerquest commented 5 years ago

main participant audience : 3rd year computer engineering and IT students of COEP. But event will be optional to attend and will be open for others.

answerquest commented 5 years ago

Tree canopy of Pune measurement using satellite data.

Ref: https://www.citylab.com/environment/2018/12/urban-tree-canopy-maps-artificial-intelligence-descartes-labs/578701/ Have to find data sources etc. Start working on this before 2nd feb if you want to get any headway.
Use ward maps of Pune to create rankings of wards based on canopy coverage, etc.

answerquest commented 5 years ago

Public transport data Animation, Visualization, Analysis

For Viz: Runparticles : http://renderfast.com/RunParticles/
Render various transport datasets into video using the above tool.

Analysis : Find the traffic choke-points, busy corridors, less-frequented stops, etc.

Possible datasets : Pune bus static GTFS, realtime logs
Hyderabad metro, Kochi metro GTFS

We will have a large dataset of GPS logs of select bus routes in Pune, will be released a few days before the event.

Links:

answerquest commented 5 years ago

PMJDY data analysis

See this discussion thread: https://groups.google.com/d/msg/datameet/ErNY82gA7dw/TOmnF7dLFQAJ I've forked it and am updating data to latest: https://github.com/datameet-pune/pmjdy

Here's an example of data analysis done on it 2 yrs ago, we can take it further, make animated / interactive visualizations etc: https://zenodo.org/record/263919#.XCWWT99fjZs

Use this zenodo page for citations: https://zenodo.org/record/1410405#.XCWYEN9fjZs

answerquest commented 5 years ago

Pune Tree census data analysis, comparison

Get datasets from here: http://nikhilvj.co.in/files/trees/
Gathered from : http://treecensus.punecorporation.org/
Disclaimer there:

For the first time, Pune Municipal Corporation (PMC) has undertaken Geo-enabled Tree Census using GIS & GPS Technology for the Pune city. So far 3300000 trees have been censused by using this technology.

Tree census data for PMC is hereby uploaded as a draft version, for few wards on PMC website. After receiving suggestions / objections, the data will be finalised by PMC. It is hereby requested to one and all that comments/Suggestion may please be given within next 30 days. This can be done by sending the email at treecensus [at] punecorporation.org

This integration of botanical information with I.T. applications will be useful to all the residents of Pune in addition to researchers as well as public authorities. It is also hoped that this experiment will go a long way in increasing the green cover of Pune.

The uploaded data is raw data, hence suggestion will be highly appreciated and valid suggestion after approval of authorities will be inculcated in the system.

This is draft data from them, they have requested for feedback on it. So this dataset should be analysed for anomalies, etc.

answerquest commented 5 years ago

Openstreetmap mapping: Rural roads in Maharashtra

https://tasks.teachosm.org/contribute?difficulty=ALL&organisation=datameet

answerquest commented 5 years ago

Scrape data from PMC STP app. (sewage treatment plant)

App: https://play.google.com/store/apps/details?id=com.ionicframework.pmcstp846325&rdid=com.ionicframework.pmcstp846325

The dept folks don't have raw data collecting at their end; the app-based system was set up by a vendor who's gone now. They have requested open data portal to extract the data from the app itself. The app fetches data dynamically. Android developers can run it on simulation and archive the data packets, convert them to CSV so Open data portal can publish the archived data. The archived data can be of great value to researchers, environmental groups to analyse how much sewage is treated, how much is untreated, how it affects the water bodies, etc.

answerquest commented 5 years ago

Data Cleaning tasks for data hosted on Pune Open Data Portal

Tabular Listing:
https://drive.google.com/open?id=10DQBIXHcC5LvRD6z-kpFbC20Hv7xen9yYzO6AWPU6HE (sorted by categories, as of 21 Jan 2019)

There are cases where the excel container has messed up dates, interpreting dd/mm/yyyy as mm/dd/yyyy. Also, multiple-row headers, merged cells etc make some of the data unsuitable for programmatic reading. Possible things that can be done:

Fix dates

Make CSVs with one header row, no gaps etc

Create an accompanying document / cover letter that details what each column stands for etc

Make unpivoted ('narrow') versions of pivoted ('wide') data

answerquest commented 5 years ago

MH Talukas PDFs and shapefiles comparison

Taluka PDF maps from MRSAC: http://www.mrsac.gov.in/en/taluka-maps pages go till http://www.mrsac.gov.in/en/taluka-maps?page=0,12

Each PDF has outlines and names of villages in the Taluka.

MH Villages shapefile : https://drive.google.com/open?id=0B3gxOiUzXTR-RVdZNXh4X1huUG8

What we have to do is

Possible discrepancies

Discrepancies can be logged in this tracking sheet (request organisers for access), or can be compiled separately if there is more details. We should try as far as possible to standardise it into tables and not keep it verbose.

Larger aim of the exercise

To document discrepancies between the official PDFs and the villages shapefile that Datameet has

Why

answerquest commented 5 years ago

Finding ward number geospatially

Given a dataset of entities in Pune with lat-long locations, use QGIS or other geospatial tools to determine the ward number under which each data-point falls, and create an additional column in this dataset indicating ward number.

Supporting data: Pune ward maps, latest as well as previous.

Data this exercise can be done on: http://opendata.punecorporation.org/Citizen/CitizenDatasets/Index?categoryId=37

answerquest commented 5 years ago

Linguistic / NLP analysis on Grievances / Feedback datasets

Linguistic / NLP analysis on Grievances / Feedback datasets hosted on Pune Open Data Portal.
Bring data-driven insights into what people are saying, what people want the most.

answerquest commented 5 years ago

GIS : Make road routes from stop locations

We have Pune's bus routes data is in the form of sequence of stops, which translates to a series of lat-long points like in the screenshot below. bus-route-lines

  1. Make a program that takes in this array of lat-long points, and generates an on-road route, with the sequence properly maintained.

    • You can use any routing API like google, openstreetmap, tomtom, graphhopper etc. Some will need payment for more advanced services so try to use free version only.
    • One approach could be to break up the route into A>B, B>C, C>D..., make road-route of each, and then merge them to create one contiguous route.
  2. Run this program on Pune's bus routes data.

    • Will need to process the existing dataset to get the desired input of lat-long points.