SBuxton-IDM / EquityMilestone

A repository for the MADS Milestone II project related to digital equity
0 stars 1 forks source link

Merging of extracted datasets on Location ID #7

Closed jasonkronemeyer closed 9 months ago

jasonkronemeyer commented 10 months ago

Easy link to sharepoint: File

The files in \DATA\FCC\Fabric_BSL_07122023_rel_3_2 which have the geo coords.

need to merge \DATA\FCC\state26_unserved_unfunded_20230831 file based on locationID in the in the fabric files. and drop unnecessary columns.

jasonkronemeyer commented 9 months ago

@SBuxton-IDM - I made some progress on the merges - still working on them right now.

jasonkronemeyer commented 9 months ago

@SBuxton-IDM - I think that the join process framework roughly built - I have created a notebook called GeoPandas - Data - Spatial Join Notebook in the root of the sharepoint.

SBuxton-IDM commented 9 months ago

There are three xlsx files in the sharepoint Fabric_BSL_07122023_rel_3_2 directory which have a primary key locationID. There is another sharepoint file FCC\state26_unserved_unfunded_20230831 that needs to be reduced, and then merged into the above 3 files based on the primary key locationID. We perhaps want to merge these three files together as well but I'll leave that separately since it's not mentioned in the issue. My process would be to:

Write a script preprocess_data.py that does the following:

jasonkronemeyer commented 9 months ago

@SBuxton-IDM - ideas on a storage with a public URL that we can start pushing the data to?

jasonkronemeyer commented 9 months ago

Built two queries to extract upload, download and min round trip time from Message Labs project in BigQuery. 10 months of data in DATA/MLABS.

Challenge was extracting a reduced data set based on a geographic polygon for the area of interest.

Here are the two queries that have been run to get sample data:

DOWNLOAD Tests

SELECT id, a.TestTime, client.Geo.Latitude, client.Geo.Longitude, ST_GEOGPOINT(client.Geo.Longitude, client.Geo.Latitude) AS geometry, a.MeanThroughputMbps AS DownMeanMbps, a.minRTT AS DownminRTT

FROM measurement-lab.ndt.unified_downloads

WHERE ST_Within(ST_GEOGPOINT(client.Geo.Longitude, client.Geo.Latitude), ST_GEOGFROMTEXT("POLYGON((-86 48, -83 48, -83 45, -86 45, -86 48))")) AND DATE BETWEEN "2023-01-01" AND "2023-10-01";

UPLOAD Tests

SELECT a.UUID, a.MeanThroughputMbps AS UpMeanMbps, a.minRTT AS UpminRTT, ST_GEOGPOINT(client.Geo.Longitude, client.Geo.Latitude) AS geometry, client.Geo.Subdivision1Name

FROM measurement-lab.ndt.unified_uploads

WHERE ST_Within(ST_GEOGPOINT(client.Geo.Longitude, client.Geo.Latitude), ST_GEOGFROMTEXT("POLYGON((-86 48, -83 48, -83 45, -86 45, -86 48))")) AND

DATE BETWEEN "2023-01-01" AND "2023-10-01";

SBuxton-IDM commented 9 months ago

I'll close this out for above query^ @jasonkronemeyer if you could include a README somewhere for how to access the data using this query it could speed up our setup