Sadless74 / googletransitdatafeed

Automatically exported from code.google.com/p/googletransitdatafeed
0 stars 0 forks source link

Mac lineends crash validator #107

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
test/data/good_feed has unix linends, LF
convert it to old-fashioned mac lineends, CR with
perl -ibak -p -e 's/\n/\r/g' *.txt

then run the validator and get

transitfeed version 1.1.7

File "./feedvalidator.py", line 285, in main
                                   memory_db=options.memory_db)
 -->   schedule = loader.Load()

    feed = test/data/good_feed
    manual_entry = False
    args = ['test/data/good_feed']
    parser = <optparse.OptionParser instance at 0xf7f2976c>
    problems = <__main__.HTMLCountingProblemReporter instance at 0xf7e1b78c>
    loader = <transitfeed.Loader instance at 0xf7e1bb8c>
    options = {'performance': None, 'manual_entry': False, 'output':
'validation-results.html', 'memory_db': False}

File "python/transitfeed.py", line 3705, in Load

 -->     self._LoadAgencies()
         self._LoadStops()
    self = <transitfeed.Loader instance at 0xf7e1bb8c>

File "python/transitfeed.py", line 3437, in _LoadAgencies
                                                   Agency._FIELD_NAMES,
 -->                                              
Agency._REQUIRED_FIELD_NAMES):
           self._problems.SetFileContext('agency.txt', row_num, row, header)
    self = <transitfeed.Loader instance at 0xf7e1bb8c>

File "python/transitfeed.py", line 3274, in _ReadCsvDict

 -->     raw_header = reader.next()
         header = []
    file_name = agency.txt
    self = <transitfeed.Loader instance at 0xf7e1bb8c>
    required = ['agency_name', 'agency_url', 'agency_timezone']
    all_cols = ['agency_name', 'agency_url', 'agency_timezone',
'agency_id', 'agency_lang', 'agency_phone']
    table_name = agency
    reader = <_csv.reader object at 0xf7e1f17c>
    eol_checker = <transitfeed.EndOfLineChecker instance at 0xf7e1bd2c>
DTA,Autorité de passage de
démonstration,http://google.com,America/Los_Angeles,123 12314

Error: newline inside string

solution is to do our own linesplitting in EndOfLineChecker instead of
using cStringIO.next(). This will make it slower so I'll need to do
benchmarking :-/

Original issue reported on code.google.com by tom.brow...@gmail.com on 5 Nov 2008 at 11:45

GoogleCodeExporter commented 9 years ago
I triggered an "Error: newline inside string" in linux with the following test
  def testHeaderQuoteNotClosed(self):
    self.zip.writestr("test.txt", "\"test_id\",\"test_name\n")
    results = list(self.loader._ReadCsvDict("test.txt",
                                            ["test_id", "test_name"], []))
    self.assertEquals([], results)
    e = self.problems.PopException("CsvSyntax")
    self.problems.AssertNoMoreExceptions()

That error is only found in old versions of _csv.c by
http://google.com/codesearch?q="newline+inside+string"&hl=en&btnG=Search+Code 
so it
may have been fixed in the csv library.

Original comment by tom.brow...@gmail.com on 21 May 2009 at 9:47

GoogleCodeExporter commented 9 years ago

Original comment by tom.brow...@gmail.com on 28 Oct 2009 at 10:56

GoogleCodeExporter commented 9 years ago
Issue 238 has been merged into this issue.

Original comment by tom.brow...@gmail.com on 8 Jun 2010 at 6:19

GoogleCodeExporter commented 9 years ago
Issue 240 has been merged into this issue.

Original comment by tom.brow...@gmail.com on 16 Jun 2010 at 8:52

GoogleCodeExporter commented 9 years ago
If you find you have a file causing this problem try loading it in a text 
editor and saving it again. Try to use an editor that lets you select the file 
format, you want something like MS/DOS. Look for lines that have a funny 
character at the end or two lines that have been joined into one.
You could also try importing the file into a spreadsheet, such as Excel, and 
then save to csv.
If you are comfortable with the command line try
perl -ibak -p -e 's/\r/\n/g' filename.txt

Original comment by the...@google.com on 17 Jun 2010 at 4:17

GoogleCodeExporter commented 9 years ago
I added this file awhile back and have not received a response to the issue. 
Please advise. We would really like to get this up and running as soon as 
possible. Thanks in advance.

Original comment by galen.be...@culvercity.org on 29 Jun 2010 at 2:07

Attachments:

GoogleCodeExporter commented 9 years ago
Galen, your stop_times file uses CR only end of line markers everywhere except 
the last line, which ends with CR+LF. You should be able to fix this by loading 
it in a text editor and saving in DOS format. Or run
perl -ibak -p -e 's/\r/\n/g' stop_times.txt

Original comment by tom.brow...@gmail.com on 30 Jun 2010 at 4:16

GoogleCodeExporter commented 9 years ago
Marissa is having the same problem with files at
http://cvtdbus.org/google/google_transit.zip
after saving them in various formats using text editor on a mac. Looking at the 
files with hexdump I see routes.txt+calendar.txt has a mix of \r\n and \r, 
stops.txt a mix of \r\n and \n, trips.txt \r\n, \n and \r etc.
Independent of the fact that this crashes the validator can someone suggest an 
easy way to clean these up? My best idea is to load each in a spreadsheet and 
export as csv again.

Original comment by tom.brow...@gmail.com on 12 Jul 2010 at 5:46

GoogleCodeExporter commented 9 years ago
Issue 244 has been merged into this issue.

Original comment by a...@google.com on 30 Aug 2010 at 8:28

GoogleCodeExporter commented 9 years ago
Issue 238 has been merged into this issue.

Original comment by bdfer...@google.com on 26 Sep 2014 at 4:42

GoogleCodeExporter commented 9 years ago
Issue 244 has been merged into this issue.

Original comment by bdfer...@google.com on 26 Sep 2014 at 4:42

GoogleCodeExporter commented 9 years ago
Moved to https://github.com/google/transitfeed/issues/107

Original comment by bdfer...@google.com on 7 Oct 2014 at 7:59