derhuerst / berlin-gtfs-rt-server

Expose Berlin & Brandenburg transit data as a GTFS-RT feed.
https://v0.berlin-gtfs-rt.transport.rest/
Other
13 stars 1 forks source link

allow importing into a custom DB schema #9

Open dancesWithCycles opened 9 months ago

dancesWithCycles commented 9 months ago

Edit @derhuerst: The gist of this thread is here: https://github.com/derhuerst/berlin-gtfs-rt-server/issues/9#issuecomment-1942259721 gtfs-via-postgres's ability to import into a custom schema should be exposed, and match-gtfs-rt-to-gtfs should be adapted to support this too. Then, a schema switch can be added in here.

depends on https://github.com/derhuerst/match-gtfs-rt-to-gtfs/issues/8


Hi folks, I am wondering if someone has already successfully replicated this repository using the --schema switch.

I am failing with this feedback.

$ time npm run build

> berlin-gtfs-rt-server@4.0.1 build
> ./build.sh && npm run docs

+ wget --compression auto -r --no-parent --no-directories -R .csv.gz -P gtfs -N https://vbb-gtfs.jannisr.de/latest/
--2024-02-12 13:08:57--  https://vbb-gtfs.jannisr.de/latest/
Resolving vbb-gtfs.jannisr.de (vbb-gtfs.jannisr.de)... 2001:41d0:701:1100::26d, 54.37.75.136
Connecting to vbb-gtfs.jannisr.de (vbb-gtfs.jannisr.de)|2001:41d0:701:1100::26d|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gtfs/index.html’

index.html                                 [ <=>                                                                         ]     624  --.-KB/s    in 0s      

Last-modified header missing -- time-stamps turned off.
2024-02-12 13:08:57 (6.35 MB/s) - ‘gtfs/index.html’ saved [3222]

Loading robots.txt; please ignore errors.
--2024-02-12 13:08:57--  https://vbb-gtfs.jannisr.de/robots.txt
...

--2024-02-12 13:08:57--  https://vbb-gtfs.jannisr.de/latest/stops.csv
Reusing existing connection to [vbb-gtfs.jannisr.de]:443.
HTTP request sent, awaiting response... 304 Not Modified
File ‘gtfs/stops.csv’ not modified on server. Omitting download.

...

FINISHED --2024-02-12 13:08:57--
Total wall clock time: 0.4s
Downloaded: 1 files, 624 in 0s (6.35 MB/s)
+ env
+ grep '^PG'
...
+ NODE_ENV=production
+ node_modules/.bin/gtfs-to-sql -d --schema vbb --trips-without-shape-id --routes-without-agency-id -- gtfs/agency.csv gtfs/calendar.csv gtfs/calendar_dates.csv gtfs/frequencies.csv gtfs/routes.csv gtfs/stop_times.csv gtfs/stops.csv gtfs/transfers.csv gtfs/trips.csv
+ psql -b
+ sponge
is_valid_lang_code
is_timezone
calendar
  processed 2892 rows
calendar_dates
  processed 97563 rows
service_days
routes
  processed 1271 rows
trips
  processed 241801 rows
frequencies
  processed 0 rows
agency
  processed 33 rows
stops
  processed 41413 rows
stop_times
  processed 5748473 rows
transfers
  processed 53855 rows
NOTICE:  extension "postgis" already exists, skipping
CREATE EXTENSION
CREATE SCHEMA
BEGIN
CREATE FUNCTION
DO
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE TYPE
CREATE TABLE
COPY 2892
CREATE TYPE
CREATE TABLE
COPY 97563
CREATE INDEX
CREATE INDEX
SELECT 269243
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE TYPE
CREATE TABLE
COPY 1271
CREATE INDEX
CREATE TYPE
CREATE TYPE
CREATE TABLE
COPY 241801
CREATE TYPE
CREATE TABLE
COPY 0
CREATE INDEX
CREATE INDEX
CREATE TABLE
COPY 33
CREATE TYPE
CREATE TYPE
CREATE TABLE
COPY 41413
ALTER TABLE
CREATE INDEX
CREATE TYPE
CREATE TYPE
CREATE TABLE
COPY 5748473
CREATE INDEX
CREATE INDEX
UPDATE 5748473
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE TYPE
CREATE TABLE
ALTER TABLE
COPY 53855
COMMIT
+++ realpath ./build.sh
++ dirname /home/pacs/sib00/users/gtfsr_srv_vbb/git/berlin-gtfs-rt-server/build.sh
+ lib=/home/pacs/sib00/users/gtfsr_srv_vbb/git/berlin-gtfs-rt-server/lib
+ NODE_ENV=production
+ node_modules/.bin/build-gtfs-match-index /home/pacs/sib00/users/gtfsr_srv_vbb/git/berlin-gtfs-rt-server/lib/hafas-info.js /home/pacs/sib00/users/gtfsr_srv_vbb/git/berlin-gtfs-rt-server/lib/gtfs-info.js
+ psql -b
+ sponge
stops
error: relation "stops" does not exist
    at Parser.parseErrorMessage (/home/pacs/sib00/users/gtfsr_srv_vbb/git/berlin-gtfs-rt-server/node_modules/pg-protocol/dist/parser.js:287:98)
    at Parser.handlePacket (/home/pacs/sib00/users/gtfsr_srv_vbb/git/berlin-gtfs-rt-server/node_modules/pg-protocol/dist/parser.js:126:29)
    at Parser.parse (/home/pacs/sib00/users/gtfsr_srv_vbb/git/berlin-gtfs-rt-server/node_modules/pg-protocol/dist/parser.js:39:38)
    at Socket.<anonymous> (/home/pacs/sib00/users/gtfsr_srv_vbb/git/berlin-gtfs-rt-server/node_modules/pg-protocol/dist/index.js:11:42)
    at Socket.emit (node:events:517:28)
    at addChunk (node:internal/streams/readable:368:12)
    at readableAddChunk (node:internal/streams/readable:341:9)
    at Readable.push (node:internal/streams/readable:278:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23) {
  length: 105,
  severity: 'ERROR',
  code: '42P01',
  detail: undefined,
  hint: undefined,
  position: '297',
  internalPosition: undefined,
  internalQuery: undefined,
  where: undefined,
  schema: undefined,
  table: undefined,
  column: undefined,
  dataType: undefined,
  constraint: undefined,
  file: 'parse_relation.c',
  line: '1392',
  routine: 'parserOpenTable'
}
unrecognized value "on;" for "ON_ERROR_STOP": Boolean expected
BEGIN
ERROR:  relation "stops" does not exist
STATEMENT:  CREATE TABLE stops_stable_ids (
    stop_id TEXT NOT NULL,
    FOREIGN KEY (stop_id) REFERENCES stops,
    stable_id TEXT NOT NULL,
    specificity INTEGER NOT NULL
);
ERROR:  current transaction is aborted, commands ignored until end of transaction block
STATEMENT:  COPY stops_stable_ids FROM STDIN csv;

I do not observe this behavior when I omit the --schema switch.

Cheers!

dancesWithCycles commented 9 months ago

This is odd. When I am using the repository straight like this

npm i
./cli.js --schema <schema> --trips-without-shape-id --routes-without-agency-id -- gtfs/*.csv | sponge | psql -b -h <host> -p <port> -U <user> -d <database> > log.txt 2>&1

the schema switch works perfectly fine.

dancesWithCycles commented 9 months ago

I might have found the cause of my struggle. The GTFS Schedule feed is successfully imported as you can see from this log file.

less log-gtfs-to-sql.txt:

NOTICE:  extension "postgis" already exists, skipping
CREATE EXTENSION
CREATE SCHEMA
BEGIN
CREATE FUNCTION
DO
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE TYPE
CREATE TABLE
COPY 2892
CREATE TYPE
CREATE TABLE
COPY 97563
CREATE INDEX
CREATE INDEX
SELECT 269243
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE TYPE
CREATE TABLE
COPY 1271
CREATE INDEX
CREATE TYPE
CREATE TYPE
CREATE TABLE
COPY 241801
CREATE TYPE
CREATE TABLE
COPY 0
CREATE INDEX
CREATE INDEX
CREATE TABLE
COPY 33
CREATE TABLE
COPY 201
CREATE TYPE
CREATE TYPE
CREATE TABLE
COPY 41413
ALTER TABLE
CREATE INDEX
CREATE TYPE
CREATE TABLE
COPY 119239
CREATE TABLE
COPY 5282696
CREATE INDEX
CREATE INDEX
CREATE VIEW
CREATE TYPE
CREATE TYPE
CREATE TABLE
COPY 5748473
CREATE INDEX
CREATE INDEX
UPDATE 5748473
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE VIEW
CREATE FUNCTION
CREATE TYPE
CREATE TABLE
ALTER TABLE
COPY 53855
COMMIT

However, the node_modules/.bin/build-gtfs-match-index does not find the imported feed as it is not looking in the respective schema.

less log-build-gtfs-match-index.txt:

unrecognized value "on;" for "ON_ERROR_STOP": Boolean expected
BEGIN
ERROR:  relation "stops" does not exist
STATEMENT:  CREATE TABLE stops_stable_ids (
        stop_id TEXT NOT NULL,
        FOREIGN KEY (stop_id) REFERENCES stops,
        stable_id TEXT NOT NULL,
        specificity INTEGER NOT NULL
);
ERROR:  current transaction is aborted, commands ignored until end of transaction block
STATEMENT:  COPY stops_stable_ids FROM STDIN csv;

Am I right or am I right? ;-)

derhuerst commented 9 months ago

The GTFS Schedule feed is successfully imported […]. […] However, the node_modules/.bin/build-gtfs-match-index does not find the imported feed as it is not looking in the respective schema.

Yes, I think that's correct!

build-gtfs-match-index comes from hafas-gtfs-rt-feed, which just calls match-gtfs-rt-to-gtfs's build-index, which currently doesn't support configuring a schema.

derhuerst commented 9 months ago

I'm wondering though: What's your use case for using a schema other than public? berlin-gtfs-rt-server (or rather hafas-gtfs-rt-feed, because berlin-gtfs-rt-server is just a thin shell around it) currently assumes that the PostgreSQL DB is exclusively used by it.

dancesWithCycles commented 9 months ago

My hosting is my use case!

I am using a Managed Server where I do not want to drop and create a database every time I update the GTFS feed. I rather drop and create the respective schema.

An alternative approach would be a script that cleans up an existing database without dropping it so that the update happens on a clean database.

At the end of the day I need to make sure to prepare a fresh environment for the GTFS feed import into PostgreSQL without dropping the database. How would you do it?

derhuerst commented 9 months ago

At the end of the day I need to make sure to prepare a fresh environment for the GTFS feed import into PostgreSQL without dropping the database. How would you do it?

I have created https://github.com/public-transport/gtfs-via-postgres/issues/57 to discuss this further, because this topic is not specific to berlin-gtfs-rt-server, and because there have been similar discussions in the gtfs-via-postgres Issues before.

derhuerst commented 9 months ago

FYI: I'll rename this Issue to be a feature request.