UDST / urbanaccess

A tool for GTFS transit and OSM pedestrian network accessibility analysis by UrbanSim
GNU Affero General Public License v3.0
236 stars 56 forks source link

Unicode decode error when trying to load gtfs of buses in Buenos Aires #59

Closed AtelierLibre closed 4 years ago

AtelierLibre commented 4 years ago

Description of the bug

Hi, thanks for making this software available.

I am trying to load the GTFS feed for buses in Buenos Aires from either of:

But I run into the encoding error below.

I seemed to make some limited progress by changing line 128 in urbanaccess > gtfs > load.py to

with open(os.path.join(csv_rootpath, folder, textfile), encoding='latin-1') as f:

but it still seems to load only a fraction of the data in the gtfs feed. Any advice that you have would be great.

Many thanks,

Nick Bristow

GTFS feed or OSM data (optional)


direct link to zip:



Paste the code that reproduces the issue here:

import os # added for dummy calendar.txt file
import pandas as pd
import urbanaccess as ua

# download and extract feeds
ua.gtfsfeeds.feeds.add_feed(add_dict={'colectivos': 'https://openmobilitydata-data.s3-us-west-1.amazonaws.com/public/feeds/colectivos-buenos-aires/1037/20190810/gtfs.zip'})

# Colectivos feed lacks a calendar.txt - creating dummy following Issue #56 
script_path = os.path.dirname(os.path.abspath(''))
root_path = os.path.join(script_path, 'mwe', 'data')
dummy_txt_file = os.path.join(root_path,
data = {'service_id': -99, 'monday': 0, 'tuesday': 0, 'wednesday': 0,
        'thursday': 0, 'friday': 0, 'saturday': 0, 'sunday': 0}
index = range(1)
pd.DataFrame(data, index).to_csv(dummy_txt_file, index=False)

loaded_feeds = ua.gtfs.load.gtfsfeed_to_df(gtfsfeed_path=None,

Paste the error message (if applicable):

UnicodeDecodeError                        Traceback (most recent call last)
<timed exec> in <module>

~\Miniconda3\envs\caf_urban_access\lib\site-packages\urbanaccess\gtfs\load.py in gtfsfeed_to_df(gtfsfeed_path, validation, verbose, bbox, remove_stops_outsidebbox, append_definitions)
    220                 'must be specified for validation.')
--> 222     _standardize_txt(csv_rootpath=gtfsfeed_path)
    224     folderlist = [foldername for foldername in os.listdir(gtfsfeed_path) if

~\Miniconda3\envs\caf_urban_access\lib\site-packages\urbanaccess\gtfs\load.py in _standardize_txt(csv_rootpath)
     35     if six.PY2:
     36         _txt_encoder_check(gtfsfiles_to_use, csv_rootpath)
---> 37     _txt_header_whitespace_check(gtfsfiles_to_use, csv_rootpath)

~\Miniconda3\envs\caf_urban_access\lib\site-packages\urbanaccess\gtfs\load.py in _txt_header_whitespace_check(gtfsfiles_to_use, csv_rootpath)
    127                 # Read from file
    128                 with open(os.path.join(csv_rootpath, folder, textfile)) as f:
--> 129                     lines = f.readlines()
    130                 lines[0] = re.sub(r'\s+', '', lines[0]) + '\n'
    131                 # Write to file

~\Miniconda3\envs\caf_urban_access\lib\encodings\cp1252.py in decode(self, input, final)
     21 class IncrementalDecoder(codecs.IncrementalDecoder):
     22     def decode(self, input, final=False):
---> 23         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
     25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 4618: character maps to <undefined>
AtelierLibre commented 4 years ago

I am going to close this. I assumed from the v.0.2.0 release that I would be okay with Python 3.7 but I have since found other issues referring to Python 3.6 compatibility and comments in the code related to Python 3 and Unicode. I have also had some success using Python 2.7.

It would be interesting to know whether the issues with Python 3 are something you are working on but otherwise I'll close this for now. Thanks again.