gboeing / osmnx-examples

Gallery of OSMnx tutorials, usage examples, and feature demonstations.
https://osmnx.readthedocs.io
MIT License
1.52k stars 521 forks source link

Different lengths of basic and extended stats dictionary columns #3

Closed RockyCD closed 6 years ago

RockyCD commented 6 years ago

Possibility of Error or Misunderstanding of how "Calculate basic street network measures (topological and metric)" works

image

This is from notebook "06-example-osmnx-networkx.ipynb

"Use OSMnx to analyze a NetworkX street network, including routing""

In the example, the length of the dataframe is 30 columns.

  1. Is that the standard?
  2. Should other places, if they exist as a polygon, be consistent in polygon length, if I am using the same arguments?

I have ran multiple tests on different cities, and I either get 26 or 28 columns, respectively.

The only two arguments that I called are G and area from these basic stats docs

I am trying to see where the error is because if I have the same argument for every place and I know the shapes exist, why would they have different column lengths?

  `stats = ox.basic_stats(G, area=area)
    for k, count in stats['streets_per_node_counts'].items():
        stats['int_{}_count'.format(k)] = count
    for k, proportion in stats['streets_per_node_proportion'].items():
        stats['int_{}_prop'.format(k)] = proportion`

The only solution I can think of why I may have different columns is how I am writing it to a csv,

        que = pd.DataFrame(pd.Series(stats)).T
        with open(file_name +'.csv', 'w') as f:
            que.to_csv(f, header=False,index=False,encoding='utf-8')

Full code for clarity

for value in df.Name:
        place = value
        gdf = ox.gdf_from_place(place)
        area = ox.project_gdf(gdf).unary_union.area
        G = ox.graph_from_place(place, network_type='drive_service')
        stats = ox.basic_stats(G, area=area)
        for k, count in stats['streets_per_node_counts'].items():
            stats['int_{}_count'.format(k)] = count
        for k, proportion in stats['streets_per_node_proportion'].items():
            stats['int_{}_prop'.format(k)] = proportion
        # delete the no longer needed dict elements
        del stats['streets_per_node_counts']
        del stats['streets_per_node_proportion']
        file_name = str(counter)
        que = pd.DataFrame(pd.Series(stats)).T
        with open(file_name + '.csv', 'w') as f:
            que.to_csv(f, header=False,index=False,encoding='utf-8')
        del stats
        del place
        del area
        del gdf
       time.sleep(1000)
gboeing commented 6 years ago

Which columns are missing when you get fewer than the example?

RockyCD commented 6 years ago

Which columns are missing when you get fewer than the example?

Unfortunately, I did not write headers to the file. I thought that the columns would be uniform.

If you like I have a list of cities and I can give you the respective column lengths if that helps? Otherwise, I can run it ,again ,and write headers for each file.

gboeing commented 6 years ago

Could you provide a full working example that I can run from scratch to reproduce? List of missing column names would be helpful too.

RockyCD commented 6 years ago

UPDATE File is attached To run, all you would have to do is this

file_ = pd.read_csv(FILE_PATH_OF_WHEREVER_SAVED_IT)
df = pd.DataFrame(file_,columns = ['Name'])

for value in df.Name:
        place = value
        gdf = ox.gdf_from_place(place)
        area = ox.project_gdf(gdf).unary_union.area
        G = ox.graph_from_place(place, network_type='drive_service')
        stats = ox.basic_stats(G, area=area)
        for k, count in stats['streets_per_node_counts'].items():
            stats['int_{}_count'.format(k)] = count
        for k, proportion in stats['streets_per_node_proportion'].items():
            stats['int_{}_prop'.format(k)] = proportion
        # delete the no longer needed dict elements
        del stats['streets_per_node_counts']
        del stats['streets_per_node_proportion']
        file_name = str(counter)
        que = pd.DataFrame(pd.Series(stats)).T
        with open(file_name + '.csv', 'w') as f:
            que.to_csv(f, header=False,index=False,encoding='utf-8')
        del stats
        del place
        del area
        del gdf
       time.sleep(1000)

Then you could just open off those files and cmd/ctril-right arrow , to get count of rows or you can

path = 'FILE_PATH_OF_WHEREVER_SAVED_IT'
all_files_2 = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file_2 = (pd.read_csv(f,delim_whitespace=True,header=None,names = ['Name']) for f in all_files_2)
concatenated_df_2 = pd.concat(df_from_each_file_2, ignore_index=True)

Name = []
for i in all_files_2:
    Name.append(i.rsplit('/',1)[1])

concatenated_df_2['Filename'] = pd.DataFrame(Name)
my = []
for i in concatenated_df_2['Name']:
    my.append(i.count(','))
concatenated_df_2['Length'] = pd.DataFrame(my)
concatenated_df_2.head()
CName2 = []
for i in concatenated_df_2.Filename.astype(str):
    CName2.append(i.rsplit('.',1)[0])
concatenated_df_2['Clean_Fname'] = pd.DataFrame(CName2)
concatenated_df_2['Length'].value_counts()

Should give you something like this

image


Forgive me, I may have mis-worded what I meant, by "missing columns"

I thought that after running through 188 places, getting different column lengths was odd. I thought that their would be place holder values for each column name inside the dict, but I could be wrong.

CLength         # of Occurrences
26                       86
28                       78
30                       20
32                        4
Name: Column_Length, dtype: int64

The fix for me,( I hope) is to print out the headers of one file for each respective group.

Could you provide a full working example that I can run from scratch to reproduce? List of missing column names would be helpful too.

I thought I already did above.

for value in df.Name:
        place = value

Value is a name of a city or place that has a shape file. Sure, I could give you a list of a few places that give me different column outputs when I write them to a csv.

Updated: See Attached

To run this you would do this


file_ = pd.read_cdv

[geoff.txt](https://github.com/gboeing/osmnx-examples/files/1220628/geoff.txt)
RockyCD commented 6 years ago

geoff.txt

RockyCD commented 6 years ago

I am closing this as a non-issue, from what I can tell, it relates to the size of each place. image

gboeing commented 6 years ago

@RockyCD yes. The stats functions return the exact number of elements as specified in the documentation. However, their sub-dicts that contain counts of intersection types and proportions of intersection types (for instance) can contain different numbers of keys. In the example above, one city has an intersection with 5 streets connected to it. The other city does not. Hence the different keys. You got different numbers of columns because you unpack the dicts into flat structures in the code snippet you ran.

RockyCD commented 6 years ago

@gboeing That is what I thought. Thank you for your time! What you have built is extremely impressive.( At least to me)