CostasGkiotsalitis / googletransitdatafeed

Automatically exported from code.google.com/p/googletransitdatafeed
0 stars 0 forks source link

Feed validator & schedule viewer & merge tool (v. 1.2.6) throw Memory Error on Windows XP when processing a big feed #273

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. launch any tool (validator, viewer, merger) on a Win XP box with a large feed
2. wait for a couple of hours
3.

What is the expected output? What do you see instead?
I would expect a validation result html page
Instead  the tool will crash and produce a stacktrace stating it suffered a 
MemoryError
Note that the Windows PC has sufficient RAM and is up-to-date

What version of the product are you using? On what operating system?
1.2.6 on Windows XP

Please provide any additional information below.
This happens to many partners in the Google Transit ecosystem. It causes 
significant overhead in partner support. 

---------------------------------
validating C:\temp\gtfs\google_transit.zip
Yikes, the program threw an unexpected exception!

Hopefully a complete report has been saved to transitfeedcrash.txt,
though if you are seeing this message we've already disappointed you once
today. Please include the report in a new issue at
http://code.google.com/p/googletransitdatafeed/issues/entry
or an email to the public group googletransitdatafeed@googlegroups.com. Sorry!

------------------------------------------------------------
transitfeed version 1.2.6

File "feedvalidator.py", line 688, in main
   feed = C:\temp\gtfs\google_transit.zip
   usage = %prog [options] [<input GTFS.zip>]

Validates GTFS file (or directory) <input GTFS.zip> and writes a HTML
report of the results to validation-results.html.

If <input GTFS.zip> is ommited the filename is read from the console. Dragging
a file into the console may enter the filename.

For more information see
http://code.google.com/p/googletransitdatafeed/wiki/FeedValidator

   args = ['C:\\temp\\gtfs\\google_transit.zip']
   parser = <transitfeed.util.OptionParserLongError instance at 0x00C47EB8>
   options = {'check_duplicate_trips': False, 'manual_entry': True, 'extension'
: None, 'memory_db': False, 'service_gap_interval': 13, 'latest_version': '', 'l
imit_per_type': 5, 'performance': None, 'output': 'validation-results.html'}

File "feedvalidator.py", line 477, in RunValidationOutputFromOptions
   feed = C:\temp\gtfs\google_transit.zip
   options = {'check_duplicate_trips': False, 'manual_entry': True, 'extension'
: None, 'memory_db': False, 'service_gap_interval': 13, 'latest_version': '', 'l
imit_per_type': 5, 'performance': None, 'output': 'validation-results.html'}

File "feedvalidator.py", line 484, in RunValidationOutputToFilename
   feed = C:\temp\gtfs\google_transit.zip
   output_file = <open file 'validation-results.html', mode 'w' at 0x00C06530>
   options = {'check_duplicate_trips': False, 'manual_entry': True, 'extension'
: None, 'memory_db': False, 'service_gap_interval': 13, 'latest_version': '', 'l
imit_per_type': 5, 'performance': None, 'output': 'validation-results.html'}
   output_filename = validation-results.html

File "feedvalidator.py", line 502, in RunValidationOutputToFile
   feed = C:\temp\gtfs\google_transit.zip
   accumulator = <__main__.HTMLCountingProblemAccumulator object at 0x00C4DB10>

   output_file = <open file 'validation-results.html', mode 'w' at 0x00C06530>
   problems = <transitfeed.problems.ProblemReporter object at 0x00C4D710>
   options = {'check_duplicate_trips': False, 'manual_entry': True, 'extension'
: None, 'memory_db': False, 'service_gap_interval': 13, 'latest_version': '', 'l
imit_per_type': 5, 'performance': None, 'output': 'validation-results.html'}

File "feedvalidator.py", line 558, in RunValidation
   feed = C:\temp\gtfs\google_transit.zip
   problems = <transitfeed.problems.ProblemReporter object at 0x00C4D710>
   loader = <transitfeed.loader.Loader instance at 0x00C61E68>
   extension_module = <module 'transitfeed' from 'C:\apps\gtfs\transitfeed-wind
ows-binary-1.2.6\library.zip\transitfeed\__init__.pyc'>
   options = {'check_duplicate_trips': False, 'manual_entry': True, 'extension'
: None, 'memory_db': False, 'service_gap_interval': 13, 'latest_version': '', 'l
imit_per_type': 5, 'performance': None, 'output': 'validation-results.html'}
   other_problems_string = The server couldn't fulfill the request. Error code:
 407.
   gtfs_factory = <transitfeed.gtfsfactory.GtfsFactory object at 0x00C53370>

File "transitfeed\loader.pyc", line 557, in Load
   self = <transitfeed.loader.Loader instance at 0x00C61E68>

File "transitfeed\loader.pyc", line 501, in _LoadStopTimes
   self = <transitfeed.loader.Loader instance at 0x00C61E68>

File "transitfeed\loader.pyc", line 275, in _ReadCSV
   file_name = stop_times.txt
   self = <transitfeed.loader.Loader instance at 0x00C61E68>
   required = ['trip_id', 'arrival_time', 'departure_time', 'stop_id', 'stop_se
quence']
   cols = ['trip_id', 'arrival_time', 'departure_time', 'stop_id', 'stop_sequen
ce', 'stop_headsign', 'pickup_type', 'drop_off_type', 'shape_dist_traveled']

File "transitfeed\loader.pyc", line 119, in _GetUtf8Contents
   file_name = stop_times.txt
   self = <transitfeed.loader.Loader instance at 0x00C61E68>

File "transitfeed\loader.pyc", line 369, in _FileContents
   file_name = stop_times.txt
   self = <transitfeed.loader.Loader instance at 0x00C61E68>
   results = None

File "zipfile.pyc", line 501, in read
   filepos = 37125224
   name = stop_times.txt
   self = <zipfile.ZipFile instance at 0x00C65530>
   bytes = ö²╤Æ#Mwe▐ÅÖ▐ñ.2∟ßpG╜L←Mñ$┌h$♫╔û┘╝²♦bƒπ≡¬/∟▒v[←u(««?sg°┌ç♫$≥┐ ≤▀ π⌂ⁿ√
┐■·ù ⁿ╧⌂ ⌂■σ ⁿ▼ ²∩ ƒ⌂√⌡» ÷▼ ≥ƒ ²? 
≤▀⌠▀■╫⌂ ▀'s■⌂ δ▀■┐ ≤▀■» 5■ ■▼ ÷/ ·_ 
■┐ _┐■π▀ ╫
 ≈ ⁿÅ ±▀ ┐ 8■ƒ τ±?∙┐ ╖ M ▌⌂²▼ ≥▼ ÷?■⌡▀ δ┐ ╟⌂ τ┐ⁿ? ÷⌂■█┐■/ »m+⌡╒~^??Å_?²≈▐~ ⁿ|åG}
ö≥½ 
:■╧§║oƒí<Z²∙╡=«┘■{+ƒß`█┴╛V∞■ⁿ♀}{<≈_σy═��
�~∩?ƒíù≥S⌂=~.┘·≤{¢å7█~=÷§╗╖╧p░█≤╫π:ç║E
♫→♫╢┤_√é-◄»åâ}╘_u[▒σ⌡↓▐l√U↨_∩πw⌐ƒß`Å»�
�╣╚a 
]~>Cy<ă┼│]│5┘s8■▌zñq²3«╧▀?⌡3∟∞│■zò§[┌
gx│φ╫k±5∟OΓ≤3∟l½┐╢π?a☺δü╨≡å█»m[$╤≤+>ç≥Φ»
π_▐▬◄┐~o²3ö╟±?8α½»∙uⁿl⌂▼ 
ô1╝ß≈┐|⌡└▼╠±▄∞ƒ
ßÇÅ»w+W_╞┴ö#│╧≡åÅ⌂╣,╛î╟∩╜|å♥.∩...
   dc = <zlib.Decompress object at 0x1F269980>
   fheader = ('PK\x03\x04', 20, 0, 0, 8, 37799, 15925, -2073395241, 31979661, 3
61995830, 14, 0)
   zinfo = <zipfile.ZipInfo object at 0x00C2BDB0>
   fname = stop_times.txt

MemoryError

------------------------------------------------------------

Yikes, the program threw an unexpected exception!

Hopefully a complete report has been saved to transitfeedcrash.txt,
though if you are seeing this message we've already disappointed you once
today. Please include the report in a new issue at
http://code.google.com/p/googletransitdatafeed/issues/entry
or an email to the public group googletransitdatafeed@googlegroups.com. Sorry!

Press enter to continue...

Original issue reported on code.google.com by thomasr...@google.com on 8 Feb 2011 at 10:22

GoogleCodeExporter commented 8 years ago
If you are getting a MemoryError your machine does not have sufficient ram to 
process the data. :-/ We know it takes as much as 4GB for some feeds. I wish I 
knew an easy way to reduce the memory requirements. How big is yours in terms 
of uncompressed bytes and number of lines in the big files?
feedvalidator option -o CONSOLE uses less ram when there are a large number of 
errors by printing them to the console instead of making a html file. See 
http://code.google.com/p/googletransitdatafeed/wiki/FeedValidator

Original comment by tom.brow...@gmail.com on 17 Feb 2011 at 7:09

GoogleCodeExporter commented 8 years ago
The support team uses Windows XP with 4 GB of RAM. The transit tools work fine 
on Macs & Linux with less then 4 GB of RAM. I think the bug should be 
addressed, because the support team looses a lot of time applying work-arounds 
for our partners. As you can see from this page 
(http://msdn.microsoft.com/en-us/library/aa366778(v=vs.85).aspx) processes in 
user space on Windows XP 32 bit have a hard limit of 2GB for memory use. Adding 
RAM to a Windows box is not going to make any difference. I think this bug 
cannot be classified as "Won't fix" because 1) it is fixable 2) it is a major 
bug that gets reported all the time by the technical support team and by the 
sales managers who visit our partners. 

Original comment by thomasr...@google.com on 17 Feb 2011 at 11:21

GoogleCodeExporter commented 8 years ago

Original comment by bdfer...@google.com on 26 Sep 2014 at 4:47