ObjectVision / NetworkModel_EU

Repository for NetworkModel EU
GNU General Public License v3.0
1 stars 0 forks source link

streets gpkg updaten met congested reistijden per link #2

Closed jipclaassens closed 1 year ago

cjacobscrisioni commented 1 year ago

Congested reistijden obv speedprofiles in tom tom database om 8:30 op een normale werkdag. Nu aan het uitzoeken hoe de zgn streets_dailyprofiles te koppelen zijn aan het netwerk. In streets_dailyprofiles zit een network_id, maar dat lijkt hele andere nummers te bevatten dan de ids van /load_fgdb/Streets/id of /load_themes_fgdb/nw/id

cjacobscrisioni commented 1 year ago

Update: de ranges in /load_themes_fgdb/hsnp/NETWORK_ID komen overeen met die in /load_themes_fgdb/nw/id, en de ranges in /load_themes_fgdb/nw/f_jnctid komen ook overeen met de ranges in de junction id's /load_themes_fgdb/jc/id. Overigens ook met /load_fgdb/Streets/f_jnctid. Ik ga een export doen van nw uit de load_themes database. Daarbij sla ik ook meteen een andere element op, nl speeds obv freeflow. Die lijken af te wijken van de gangbare snelheden. Voor modelleren kunnen we wss beter freeflow speeds gebruiken.

cjacobscrisioni commented 1 year ago

koppelen van /load_themes_fgdb/hsnp/NETWORK_ID aan /load_themes_fgdb/nw/id levert 13% null values op. Die vang ik op met een makedefined (rel factor, 1f). Nu bezig de netwerk fss opnieuw te genereren. Nieuwe attributen heten minutes_congested_0830 (reistijden incl congestie om 8:30) en minutes_freeflow_0830 (reistijden freeflow om 8:30). De bron van het netwerk is wel veranderd, we zullen moeten controleren wat het effect is op netwerkgeneratie en versimpeling.

cjacobscrisioni commented 1 year ago

Eerste check: de geometrieen van de nieuwe database lijken nog steeds in een loop te zitten (dus cutten op 1 + n / 2 nog steeds van belang)

cjacobscrisioni commented 1 year ago

Ik ben nu al meer dan een dag bezig met het converteren van de data in de geodatabase naar een gpkg. Maar dat loopt niet lekker. Ik definieer de code op deze manier:

container export_geopackage:  
    StorageName     = "%sourcedataprojdir%/out_fss/tomtom_export_231005.gpkg"
    ,  StorageType     = "gdalwrite.vect"
    ,  StorageReadOnly = "False" {
    unit<uint32> streets:= load_themes_fgdb/nw {
        attribute<WGS84> Geometry (arc):= load_themes_fgdb/nw/Geometry;
        attribute<float64>   f_jnctid:= load_themes_fgdb/nw/f_jnctid;
        attribute<float64>   t_jnctid:= load_themes_fgdb/nw/t_jnctid;
        attribute<float64>   meters:=   load_themes_fgdb/nw/meters;
        //attribute<uint16>  kph:=      load_themes_fgdb/nw/kph;
        attribute<float32>   minutes:=  load_themes_fgdb/nw/minutes;
        attribute<float32>   minutes_congested_0830:= load_themes_fgdb/nw/minutes_congested_0830;
        attribute<float32>   minutes_freeflow_0830:= load_themes_fgdb/nw/minutes_freeflow_0830;
        attribute<directions>   Direction:      expr = "makeDefined(rlookup(Oneway, directions/code), 0[directions])";
    }
    unit<uint32> junctions:= /load_themes_fgdb/jc {
        attribute<WGS84> Geometry:= /load_themes_fgdb/jc/Geometry;
        attribute<float64> nodeid:= /load_themes_fgdb/jc/id;
        attribute<uint16> zelev:= /load_themes_fgdb/jc/elev;
    }
}

Het berekenen van de achterliggende velden is wat traag (~140M rows dus hey) maar niet problematisch. Op het moment dat ik de export_geopackage/streets unit dubbelklik in de GUI loopt het fout.

Alle velden behalve directions zijn na een tijdje blauw / klaar volgens GeoDMS (v12.0.0), maar ogenschijnlijk blijft er data gestuurd worden naar de file. Volgens het bestandssysteem wordt de file ook steeds gewijzigd. Maar het proces loopt nu (10:40 op vrijdag) al sinds gisterenmiddag rond 16:00. Is wat veel.

eoudejans commented 1 year ago

Issue reproducible. Possible cause is the transaction mechanism, which allows to recover from faults at the user side when updating the GPKG (sqlite) database. As writing with geodms to gpkg using gdalwrite.vect the file is opened exclusively, the transaction mechanism may be overkill and hampers performance.

eoudejans commented 1 year ago

Another observation is that reading sufficiently large GDB files is slow. This is in part caused by the file format, but also in part caused by the column read order used in geodms which prompts for each layer a full read of the whole column. The more natural way to read a GDB file (and other row oriented vector formats) is per row.

eoudejans commented 1 year ago

An experimental solution route is to change the configuration options of gdal:

See gpkg config options: CPLSetConfigOption("OGR_SQLITE_CACHE", "1064"); CPLSetConfigOption("OGR_SQLITE_JOURNAL", "OFF"); CPLSetConfigOption("OGR_SQLITE_SYNCHRONOUS", "OFF");

Which may cause corrupt gpkg files in case of for instance power outages, but the data can be recomputed in geodms from calculation rules anyways before writing, so that is likely not problematic.

eoudejans commented 1 year ago

Note that currently there is no entrypoint where to set the gdal config options, but this can be added in a few lines of code should this experimental solution prove fruitful.

eoudejans commented 1 year ago

Minor speedup obtained from experimentally changing of gdal configuration options. However, the majority of time seems to be spend in file gdb type operations.

cjacobscrisioni commented 1 year ago

Thanks Erik! Quick q - did you manage to export the full gpkg file? My PC is still storing the gpkg file, currently weighing 50Gbs and going.

eoudejans commented 1 year ago

Currently exporting the smaller of the two, and still mostly reading the gdb files. Will let you know when it is finished and how long it took.

eoudejans commented 1 year ago

As a temporary workaround the good old ogr2ogr may come in handy: ogr2ogr -f GPKG "C:\SourceData*.gpkg" "C:\SourceData*.gdb

Which is, amongst other locations, available from the OSGeo4W shell on Windows that comes with the Qgis installation. ogr2ogr can be found in the bin folder, for me locally at: C:\Program Files\QGIS 3.26.3\bin>

This seems to run just fine, if the speed stagnates we can always tune the gdal parameters for gpkg, in a similar way I did above.

eoudejans commented 1 year ago

In light of the previous comment, the following is the one I am running as we speak: ogr2ogr.exe -of GPKG C:\SourceData*.gpkg --config OGR_SQLITE_JOURNAL OFF --config OGR_SQLITE_CACHE 1064 --config OGR_SQLITE_SYNCHRONOUS OFF C:\SourceData*.gdb

cjacobscrisioni commented 1 year ago

That's an immediate conversion without edits right? Let's keep it as a fallback. Happy to report that GeoDMS just managed to store the biggest of the two tables to gpkg. Only took ~24h. The gpkg export is definitely a part of the problem, storing from .gdb to .fss format is a lot faster (still takes multiple hours though)

eoudejans commented 1 year ago

Yes it is a direct conversion, geodms can never be as fast, as ogr2ogr converts the gdb to intermediate gdal features and consequently stores these directly in gpkg row-by-row.

Whereas in geodms we read the data to our internal format following column-by-column order, and then we convert the geodms internal data to gdal data row-by-row and write on a per-tile basis. (the default tiling nowadays is 65k rows).

I suppose this is a good fallback indeed to get rid of the gdb type storage. The smaller gdb took 3-4 hours to convert to gpkg.

eoudejans commented 1 year ago

That's an immediate conversion without edits right? Let's keep it as a fallback. Happy to report that GeoDMS just managed to store the biggest of the two tables to gpkg. Only took ~24h. The gpkg export is definitely a part of the problem, storing from .gdb to .fss format is a lot faster (still takes multiple hours though)

Today I'll consult with Maarten on this topic, but I think it is a good idea to also expose the gdal config settings to the end user, this is the only set of options not exposed to the geodms user at the moment.

cjacobscrisioni commented 1 year ago

Update: storing to gpkg took until 7th of October, 5 AM. The stored file is massive (63Gb) even though the amount of stored data is limited. I am chalking the last one up to inherent inefficiency in the gpkg format.

More broadly, big fan of having the option to adjust the gdal params internally. Not immediately an option we'd use much, but very useful when necessary :)

eoudejans commented 1 year ago

As it turns out, we already have exposed configuration options in GeoDMS. Internally these options will be applied to the local GDAL thread. There was, however, no wiki entry of GDAL_ConfigurationOptions. Now there is, see GDAL_ConfigurationOptions.

eoudejans commented 1 year ago

I can confirm that the following works:

container write
{
    unit<uint32> optionSet := range(uint32, 0, 3);
    attribute<string> GDAL_ConfigurationOptions (optionSet) : [ "OGR_SQLITE_JOURNAL=OFF", "OGR_SQLITE_CACHE=1024", "OGR_SQLITE_SYNCHRONOUS=OFF" ];

    container write_large_gpkg := read_large_gdb
    ,  StorageName     = "storage_name.gpkg"
    ,  StorageType     = "gdalwrite.vect"
    ,  StorageReadOnly = "False"
    {
        Unit<UInt32> mn := select_with_attr_by_cond(read_large_gdb/mn, const(true, read_large_gdb/mn));
        Unit<UInt32> nw := select_with_attr_by_cond(read_large_gdb/nw, const(true, read_large_gdb/nw));
        Unit<UInt32> rr := select_with_attr_by_cond(read_large_gdb/rr, const(true, read_large_gdb/rr));
        Unit<UInt32> pd := select_with_attr_by_cond(read_large_gdb/pd, const(true, read_large_gdb/pd));
        Unit<UInt32> mnpoi_pi := select_with_attr_by_cond(read_large_gdb/mnpoi_pi, const(true, read_large_gdb/mnpoi_pi));
        Unit<UInt32> mn_pi := select_with_attr_by_cond(read_large_gdb/mn_pi, const(true, read_large_gdb/mn_pi));
        Unit<UInt32> hsnp := select_with_attr_by_cond(read_large_gdb/hsnp, const(true, read_large_gdb/hsnp));
        Unit<UInt32> mp := select_with_attr_by_cond(read_large_gdb/mp, const(true, read_large_gdb/mp));
        Unit<UInt32> si := select_with_attr_by_cond(read_large_gdb/si, const(true, read_large_gdb/si));
        Unit<UInt32> sp := select_with_attr_by_cond(read_large_gdb/sp, const(true, read_large_gdb/sp));
        Unit<UInt32> rs := select_with_attr_by_cond(read_large_gdb/rs, const(true, read_large_gdb/rs));
        Unit<UInt32> hspr := select_with_attr_by_cond(read_large_gdb/hspr, const(true, read_large_gdb/hspr));
        Unit<UInt32> jc := select_with_attr_by_cond(read_large_gdb/jc, const(true, read_large_gdb/jc));
    }
}
cjacobscrisioni commented 1 year ago

@eoudejans , thanks! This is going to be very useful!

In the meantime I have fallen back to mostly using pre-existing geopackages that Jip created previously. I managed to create a fss file with results from the hsnp table and link that with those geopackages. Currently creating a network with congested travel times.

cjacobscrisioni commented 1 year ago

Update: hsnp data file with speed profiles is downloadable here: https://surfdrive.surf.nl/files/index.php/s/9jwvEINIE19oXv6

Network is updated, new definitions committed and pushed in git (congestion_dgregio branch)

cjacobscrisioni commented 11 months ago

Code die ik gebruik voor extractie profiles:

    container load_themes_fgdb 
    :  StorageName     = "%sourcedataprojdir%/in_gdb/TomTom_Compiled_Themes_2021.gdb"
    ,  StorageType     = "gdal.vect"
    ,  StorageReadOnly = "True"
    ,  SyncMode        = "AllTables" {
        unit<uint32> nw {
            attribute<WGS84> Geometry (arc);
            attribute<hsnp>  hsnp_id:= invert(hsnp/n_id);
            attribute<float64> f_jnctid;
            attribute<float64> t_jnctid;
            attribute<float32> kph;
            attribute<float32> minutes;
            attribute<float32> minutes_congested_0830:= minutes / makeDefined(hsnp/factor_weekday_0830[hsnp_id], 1f);
            attribute<float32> minutes_freeflow_0830:= minutes / makeDefined(hsnp/factor_freeflow_0830[hsnp_id], 1f);
        }
        unit<uint32> jc {
            attribute<WGS84> Geometry;
        }
        unit<uint32> hsnp {
            attribute<float64> Network_id;
            attribute<nw>       n_id:= rlookup(Network_id, nw/id);
            attribute<uint16> spweekday;
            attribute<uint16> spfreeflow;

            attribute<float32> factor_weekday_0830:= rjoin(point(uint32(spweekday), const(30600,.,uint32), upoint), hspr/unipt, hspr/rel_sp / 100f);
            attribute<float32> factor_freeflow_0830:= rjoin(point(uint32(spfreeflow), const(30600,.,uint32), upoint), hspr/unipt, hspr/rel_sp / 100f);
        }
        unit<uint32> hspr {
            attribute<uint16>   profile_id;
            attribute<int32>    time_slot;
            attribute<float32>  rel_sp;
            attribute<int32>    hour:= time_slot / (60i * 60i);
            attribute<int32>    minutes:= (time_slot - (hour * 60i * 60i)) / 60i;
            attribute<upoint>   unipt:= point(uint32(profile_id), uint32(time_slot), upoint);
        }
    }