UDST / urbanaccess

A tool for GTFS transit and OSM pedestrian network accessibility analysis by UrbanSim
https://udst.github.io/urbanaccess/index.html
GNU Affero General Public License v3.0
236 stars 56 forks source link

Unicode decode error when trying to load gtfs of buses in Buenos Aires #59

Closed AtelierLibre closed 4 years ago

AtelierLibre commented 4 years ago

Description of the bug

Hi, thanks for making this software available.

I am trying to load the GTFS feed for buses in Buenos Aires from either of:

But I run into the encoding error below.

I seemed to make some limited progress by changing line 128 in urbanaccess > gtfs > load.py to

with open(os.path.join(csv_rootpath, folder, textfile), encoding='latin-1') as f:

but it still seems to load only a fraction of the data in the gtfs feed. Any advice that you have would be great.

Many thanks,

Nick Bristow

GTFS feed or OSM data (optional)

https://transitfeeds.com/p/colectivos-buenos-aires/1037

direct link to zip:

https://openmobilitydata-data.s3-us-west-1.amazonaws.com/public/feeds/colectivos-buenos-aires/1037/20190810/gtfs.zip

Environment

Paste the code that reproduces the issue here:

import os # added for dummy calendar.txt file
import pandas as pd
import urbanaccess as ua

# download and extract feeds
ua.gtfsfeeds.feeds.add_feed(add_dict={'colectivos': 'https://openmobilitydata-data.s3-us-west-1.amazonaws.com/public/feeds/colectivos-buenos-aires/1037/20190810/gtfs.zip'})
ua.gtfsfeeds.download()

# Colectivos feed lacks a calendar.txt - creating dummy following Issue #56 
script_path = os.path.dirname(os.path.abspath(''))
root_path = os.path.join(script_path, 'mwe', 'data')
dummy_txt_file = os.path.join(root_path,
                              'gtfsfeed_text',
                              'colectivos',
                              'calendar.txt')
data = {'service_id': -99, 'monday': 0, 'tuesday': 0, 'wednesday': 0,
        'thursday': 0, 'friday': 0, 'saturday': 0, 'sunday': 0}
index = range(1)
pd.DataFrame(data, index).to_csv(dummy_txt_file, index=False)

%%time
loaded_feeds = ua.gtfs.load.gtfsfeed_to_df(gtfsfeed_path=None,
                                           validation=True,
                                           verbose=True,
                                           bbox=(-59.8,-35.6,-57.2,-33.6),
                                           remove_stops_outsidebbox=False,
                                           append_definitions=True)

Paste the error message (if applicable):

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<timed exec> in <module>

~\Miniconda3\envs\caf_urban_access\lib\site-packages\urbanaccess\gtfs\load.py in gtfsfeed_to_df(gtfsfeed_path, validation, verbose, bbox, remove_stops_outsidebbox, append_definitions)
    220                 'must be specified for validation.')
    221 
--> 222     _standardize_txt(csv_rootpath=gtfsfeed_path)
    223 
    224     folderlist = [foldername for foldername in os.listdir(gtfsfeed_path) if

~\Miniconda3\envs\caf_urban_access\lib\site-packages\urbanaccess\gtfs\load.py in _standardize_txt(csv_rootpath)
     35     if six.PY2:
     36         _txt_encoder_check(gtfsfiles_to_use, csv_rootpath)
---> 37     _txt_header_whitespace_check(gtfsfiles_to_use, csv_rootpath)
     38 
     39 

~\Miniconda3\envs\caf_urban_access\lib\site-packages\urbanaccess\gtfs\load.py in _txt_header_whitespace_check(gtfsfiles_to_use, csv_rootpath)
    127                 # Read from file
    128                 with open(os.path.join(csv_rootpath, folder, textfile)) as f:
--> 129                     lines = f.readlines()
    130                 lines[0] = re.sub(r'\s+', '', lines[0]) + '\n'
    131                 # Write to file

~\Miniconda3\envs\caf_urban_access\lib\encodings\cp1252.py in decode(self, input, final)
     21 class IncrementalDecoder(codecs.IncrementalDecoder):
     22     def decode(self, input, final=False):
---> 23         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
     24 
     25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 4618: character maps to <undefined>
AtelierLibre commented 4 years ago

I am going to close this. I assumed from the v.0.2.0 release that I would be okay with Python 3.7 but I have since found other issues referring to Python 3.6 compatibility and comments in the code related to Python 3 and Unicode. I have also had some success using Python 2.7.

It would be interesting to know whether the issues with Python 3 are something you are working on but otherwise I'll close this for now. Thanks again.