OpenTransitTools / gtfsdb

GTFS ORM using SQLAlchemy
Mozilla Public License 2.0
157 stars 45 forks source link

Segmentation fault during Block.post_process #13

Closed mileserickson closed 8 years ago

mileserickson commented 8 years ago

On a t2.small EC2 instance with 2GB of RAM, bin/gtfsdb-load http://developer.onebusaway.org/tmp/sound/gtfs/modified/1_gtfs.zip crashed with a segmentation fault during Block.post_process.

(The same load process completed successfully when the instance type was changed temporarily to an m4.10xlarge instance with 160GB of RAM.)

fpurcell commented 8 years ago

Hey MIles,

Thanks for the report. I just tried loading the GTFS in question (on both a MacBook Air w/8G and a CentOS server with 32G of RAM), and I haven't reproduced the seg-fault. Further, I've tried setting ulimit -Sv 500000 and ulimit -lv 500000 to ask the OS / shell to limit memory usage on the MacBook. The memory tops out at 1.7G at points...but it hasn't produced a seg-fault.

Not sure when (if) I'll test on something else, like EC2 or a VM server. If this is a show-stopper, I could add a cmd-line parameter to skip the block post-processing. Let me know. (BTW, the block table is new, building a cache of ordered trips that service a given block -- it's not required by gtfsdb ... it does help with stop schedules to determine what stop times might be arrival only).

Take care, Frank

mileserickson commented 8 years ago

Greetings Frank,

Adding a command-line parameter to skip the block post-processing sounds like a reasonable move. I can try to reproduce the error on an identical EC2 instance if you'd like.

Cheers, Miles

fpurcell commented 8 years ago

-nb or --ignore_blocks will skip the population of the blocks table.

XtremeCurling commented 6 years ago

Just writing down my own experience in case anyone else lands here with a similar problem:

I too am having memory issues, although on a t2.micro EC2 instance (1GB of RAM). First during Block.post_process; but then also after I ran with --ignore_blocks, during Route.post_process. I've created a swapfile and am retrying with that; will let you know if it gives me enough memory to complete successfully.

FYI, I'm running this on Ubuntu, loading to a postgres database, with the latest SFMTA gtfs (https://transitfeeds.com/p/sfmta/60/latest/download).