Closed jasonkronemeyer closed 9 months ago
@SBuxton-IDM - I made some progress on the merges - still working on them right now.
@SBuxton-IDM - I think that the join process framework roughly built - I have created a notebook called GeoPandas - Data - Spatial Join Notebook in the root of the sharepoint.
There are three xlsx files in the sharepoint Fabric_BSL_07122023_rel_3_2
directory which have a primary key locationID
. There is another sharepoint file FCC\state26_unserved_unfunded_20230831
that needs to be reduced, and then merged into the above 3 files based on the primary key locationID
. We perhaps want to merge these three files together as well but I'll leave that separately since it's not mentioned in the issue. My process would be to:
data
directory with a README.md specifying the names of the files we are going to use and push to the github repo.
Alternatively if there is a public url where I can find these files then I could curl them instead of interfacing with sharepoint. @jasonkronemeyer are these files hosted somewhere at a public url?Write a script preprocess_data.py
that does the following:
/data
locationID
primary keyresults/
directory@SBuxton-IDM - ideas on a storage with a public URL that we can start pushing the data to?
Built two queries to extract upload, download and min round trip time from Message Labs project in BigQuery. 10 months of data in DATA/MLABS.
Challenge was extracting a reduced data set based on a geographic polygon for the area of interest.
Here are the two queries that have been run to get sample data:
SELECT id, a.TestTime, client.Geo.Latitude, client.Geo.Longitude, ST_GEOGPOINT(client.Geo.Longitude, client.Geo.Latitude) AS geometry, a.MeanThroughputMbps AS DownMeanMbps, a.minRTT AS DownminRTT
FROM measurement-lab.ndt.unified_downloads
WHERE ST_Within(ST_GEOGPOINT(client.Geo.Longitude, client.Geo.Latitude), ST_GEOGFROMTEXT("POLYGON((-86 48, -83 48, -83 45, -86 45, -86 48))")) AND DATE BETWEEN "2023-01-01" AND "2023-10-01";
SELECT a.UUID, a.MeanThroughputMbps AS UpMeanMbps, a.minRTT AS UpminRTT, ST_GEOGPOINT(client.Geo.Longitude, client.Geo.Latitude) AS geometry, client.Geo.Subdivision1Name
FROM measurement-lab.ndt.unified_uploads
WHERE ST_Within(ST_GEOGPOINT(client.Geo.Longitude, client.Geo.Latitude), ST_GEOGFROMTEXT("POLYGON((-86 48, -83 48, -83 45, -86 45, -86 48))")) AND
DATE BETWEEN "2023-01-01" AND "2023-10-01";
I'll close this out for above query^ @jasonkronemeyer if you could include a README somewhere for how to access the data using this query it could speed up our setup
Easy link to sharepoint: File
The files in \DATA\FCC\Fabric_BSL_07122023_rel_3_2 which have the geo coords.
need to merge \DATA\FCC\state26_unserved_unfunded_20230831 file based on locationID in the in the fabric files. and drop unnecessary columns.