BlinkTagInc / node-gtfs

Import GTFS transit data into SQLite and query routes, stops, times, fares and more.
MIT License
439 stars 150 forks source link

Out of memory when importing GTFS #172

Closed Jouca closed 2 days ago

Jouca commented 3 days ago

Hello, I'm currently trying to import a very large GTFS containing over 9 millions records of stop times, however the importation is stopping after a few moments due to a Java heap out of memory, which wasn't the case on the previous release of this package.

Starting GTFS import for 1 file using SQLite database at ./idfm-gtfs.db
Importing GTFS from ./IDFM-gtfs.zip
Importing - agency.txt - 61 lines imported
Importing - areas.txt - No file found
Importing - attributions.txt - No file found
Importing - booking_rules.txt - No file found
Importing - calendar.txt - 1265 lines imported
Importing - calendar_dates.txt - 2992 lines imported
Importing - fare_attributes.txt - No file found
Importing - fare_leg_rules.txt - No file found
Importing - fare_media.txt - No file found
Importing - fare_products.txt - No file found
Importing - fare_rules.txt - No file found
Importing - fare_transfer_rules.txt - No file found
Importing - feed_info.txt - No file found
Importing - frequencies.txt - No file found
Importing - levels.txt - No file found
Importing - location_group_stops.txt - No file found
Importing - location_groups.txt - No file found
Importing - locations.geojson - No file found
Importing - networks.txt - No file found
Importing - pathways.txt - 5065 lines imported
Importing - route_networks.txt - No file found
Importing - routes.txt - 1988 lines imported
Importing - shapes.txt - No file found
Importing - stop_areas.txt - No file found
Importing - stop_times.txt
<--- Last few GCs --->

[29152:000001CCF02E6D40]   103302 ms: Mark-Compact (reduce) 2047.0 (2084.8) -> 2046.1 (2083.1) MB, 234.28 / 0.02 ms  (+ 132.2 ms in 28 steps since start of marking, biggest step 10.2 ms, walltime since start of marking 393 ms) (average mu = 0.309, current[29152:000001CCF02E6D40]   103761 ms: Mark-Compact (reduce) 2047.1 (2083.1) -> 2045.9 (2083.1) MB, 314.83 / 0.02 ms  (+ 52.9 ms in 16 steps since start of marking, biggest step 4.4 ms, walltime since start of marking 382 ms) (average mu = 0.260, current m

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 00007FF75C8671AB node::SetCppgcReference+16075
 2: 00007FF75C7DDCC6 v8::base::CPU::num_virtual_address_bits+79190
 3: 00007FF75C7DFED5 v8::base::CPU::num_virtual_address_bits+87909
 4: 00007FF75D24F061 v8::Isolate::ReportExternalAllocationLimitReached+65
 5: 00007FF75D2387F8 v8::Function::Experimental_IsNopFunction+1336
 6: 00007FF75D09A120 v8::Platform::SystemClockTimeMillis+659328
 7: 00007FF75D0A63A3 v8::Platform::SystemClockTimeMillis+709123
 8: 00007FF75D0A3D04 v8::Platform::SystemClockTimeMillis+699236
 9: 00007FF75D096E40 v8::Platform::SystemClockTimeMillis+646304
10: 00007FF75D0AC4BA v8::Platform::SystemClockTimeMillis+733978
11: 00007FF75D0ACD37 v8::Platform::SystemClockTimeMillis+736151
12: 00007FF75D0B596E v8::Platform::SystemClockTimeMillis+772046
13: 00007FF75D0CA1AA v8::Platform::SystemClockTimeMillis+856074
14: 00007FF75D0CA493 v8::Platform::SystemClockTimeMillis+856819
15: 00007FF75CE5F6BD v8::base::Thread::StartSynchronously+491133
16: 00007FF75CE5FAD6 v8::base::Thread::StartSynchronously+492182
17: 00007FF75D0BF3DF v8::Platform::SystemClockTimeMillis+811583
18: 00007FF75CEE6873 v8::base::Thread::StartSynchronously+1044531
19: 00007FF75CEE7881 v8::base::Thread::StartSynchronously+1048641
20: 00007FF75CE8E4C6 v8::base::Thread::StartSynchronously+683142
21: 00007FF75CE65CCC v8::base::Thread::StartSynchronously+517260
22: 00007FF75CE7A854 v8::base::Thread::StartSynchronously+602132
23: 00007FF75CD76871 v8::CodeEvent::GetFunctionName+95409
24: 00007FF75CD75710 v8::CodeEvent::GetFunctionName+90960
25: 00007FF75D30062E v8::PropertyDescriptor::writable+677134
26: 00007FF6DD5556EC

Totally sure it's related with the new method of importation which is taking a lot of RAM usage, which wasn't the case before.

brendannee commented 2 days ago

Thanks for reporting this issue.

I published an updated version https://github.com/BlinkTagInc/node-gtfs/releases/tag/4.15.1 which breaks import into chunks to avoid running out of memory and tried it with IDFM GTFS.

Try it out and let me know if this solves the issue.

Jouca commented 2 days ago

Hello, Yep it fixed the issue, thanks! 👍