Closed jipclaassens closed 1 year ago
Update: de ranges in /load_themes_fgdb/hsnp/NETWORK_ID komen overeen met die in /load_themes_fgdb/nw/id, en de ranges in /load_themes_fgdb/nw/f_jnctid komen ook overeen met de ranges in de junction id's /load_themes_fgdb/jc/id. Overigens ook met /load_fgdb/Streets/f_jnctid. Ik ga een export doen van nw uit de load_themes database. Daarbij sla ik ook meteen een andere element op, nl speeds obv freeflow. Die lijken af te wijken van de gangbare snelheden. Voor modelleren kunnen we wss beter freeflow speeds gebruiken.
koppelen van /load_themes_fgdb/hsnp/NETWORK_ID aan /load_themes_fgdb/nw/id levert 13% null values op. Die vang ik op met een makedefined (rel factor, 1f). Nu bezig de netwerk fss opnieuw te genereren. Nieuwe attributen heten minutes_congested_0830 (reistijden incl congestie om 8:30) en minutes_freeflow_0830 (reistijden freeflow om 8:30). De bron van het netwerk is wel veranderd, we zullen moeten controleren wat het effect is op netwerkgeneratie en versimpeling.
Eerste check: de geometrieen van de nieuwe database lijken nog steeds in een loop te zitten (dus cutten op 1 + n / 2 nog steeds van belang)
Ik ben nu al meer dan een dag bezig met het converteren van de data in de geodatabase naar een gpkg. Maar dat loopt niet lekker. Ik definieer de code op deze manier:
container export_geopackage:
StorageName = "%sourcedataprojdir%/out_fss/tomtom_export_231005.gpkg"
, StorageType = "gdalwrite.vect"
, StorageReadOnly = "False" {
unit<uint32> streets:= load_themes_fgdb/nw {
attribute<WGS84> Geometry (arc):= load_themes_fgdb/nw/Geometry;
attribute<float64> f_jnctid:= load_themes_fgdb/nw/f_jnctid;
attribute<float64> t_jnctid:= load_themes_fgdb/nw/t_jnctid;
attribute<float64> meters:= load_themes_fgdb/nw/meters;
//attribute<uint16> kph:= load_themes_fgdb/nw/kph;
attribute<float32> minutes:= load_themes_fgdb/nw/minutes;
attribute<float32> minutes_congested_0830:= load_themes_fgdb/nw/minutes_congested_0830;
attribute<float32> minutes_freeflow_0830:= load_themes_fgdb/nw/minutes_freeflow_0830;
attribute<directions> Direction: expr = "makeDefined(rlookup(Oneway, directions/code), 0[directions])";
}
unit<uint32> junctions:= /load_themes_fgdb/jc {
attribute<WGS84> Geometry:= /load_themes_fgdb/jc/Geometry;
attribute<float64> nodeid:= /load_themes_fgdb/jc/id;
attribute<uint16> zelev:= /load_themes_fgdb/jc/elev;
}
}
Het berekenen van de achterliggende velden is wat traag (~140M rows dus hey) maar niet problematisch. Op het moment dat ik de export_geopackage/streets unit dubbelklik in de GUI loopt het fout.
Alle velden behalve directions zijn na een tijdje blauw / klaar volgens GeoDMS (v12.0.0), maar ogenschijnlijk blijft er data gestuurd worden naar de file. Volgens het bestandssysteem wordt de file ook steeds gewijzigd. Maar het proces loopt nu (10:40 op vrijdag) al sinds gisterenmiddag rond 16:00. Is wat veel.
Issue reproducible. Possible cause is the transaction mechanism, which allows to recover from faults at the user side when updating the GPKG (sqlite) database. As writing with geodms to gpkg using gdalwrite.vect the file is opened exclusively, the transaction mechanism may be overkill and hampers performance.
Another observation is that reading sufficiently large GDB files is slow. This is in part caused by the file format, but also in part caused by the column read order used in geodms which prompts for each layer a full read of the whole column. The more natural way to read a GDB file (and other row oriented vector formats) is per row.
An experimental solution route is to change the configuration options of gdal:
See gpkg config options: CPLSetConfigOption("OGR_SQLITE_CACHE", "1064"); CPLSetConfigOption("OGR_SQLITE_JOURNAL", "OFF"); CPLSetConfigOption("OGR_SQLITE_SYNCHRONOUS", "OFF");
Which may cause corrupt gpkg files in case of for instance power outages, but the data can be recomputed in geodms from calculation rules anyways before writing, so that is likely not problematic.
Note that currently there is no entrypoint where to set the gdal config options, but this can be added in a few lines of code should this experimental solution prove fruitful.
Minor speedup obtained from experimentally changing of gdal configuration options. However, the majority of time seems to be spend in file gdb type operations.
Thanks Erik! Quick q - did you manage to export the full gpkg file? My PC is still storing the gpkg file, currently weighing 50Gbs and going.
Currently exporting the smaller of the two, and still mostly reading the gdb files. Will let you know when it is finished and how long it took.
As a temporary workaround the good old ogr2ogr may come in handy: ogr2ogr -f GPKG "C:\SourceData*.gpkg" "C:\SourceData*.gdb
Which is, amongst other locations, available from the OSGeo4W shell on Windows that comes with the Qgis installation. ogr2ogr can be found in the bin folder, for me locally at: C:\Program Files\QGIS 3.26.3\bin>
This seems to run just fine, if the speed stagnates we can always tune the gdal parameters for gpkg, in a similar way I did above.
In light of the previous comment, the following is the one I am running as we speak: ogr2ogr.exe -of GPKG C:\SourceData*.gpkg --config OGR_SQLITE_JOURNAL OFF --config OGR_SQLITE_CACHE 1064 --config OGR_SQLITE_SYNCHRONOUS OFF C:\SourceData*.gdb
That's an immediate conversion without edits right? Let's keep it as a fallback. Happy to report that GeoDMS just managed to store the biggest of the two tables to gpkg. Only took ~24h. The gpkg export is definitely a part of the problem, storing from .gdb to .fss format is a lot faster (still takes multiple hours though)
Yes it is a direct conversion, geodms can never be as fast, as ogr2ogr converts the gdb to intermediate gdal features and consequently stores these directly in gpkg row-by-row.
Whereas in geodms we read the data to our internal format following column-by-column order, and then we convert the geodms internal data to gdal data row-by-row and write on a per-tile basis. (the default tiling nowadays is 65k rows).
I suppose this is a good fallback indeed to get rid of the gdb type storage. The smaller gdb took 3-4 hours to convert to gpkg.
That's an immediate conversion without edits right? Let's keep it as a fallback. Happy to report that GeoDMS just managed to store the biggest of the two tables to gpkg. Only took ~24h. The gpkg export is definitely a part of the problem, storing from .gdb to .fss format is a lot faster (still takes multiple hours though)
Today I'll consult with Maarten on this topic, but I think it is a good idea to also expose the gdal config settings to the end user, this is the only set of options not exposed to the geodms user at the moment.
Update: storing to gpkg took until 7th of October, 5 AM. The stored file is massive (63Gb) even though the amount of stored data is limited. I am chalking the last one up to inherent inefficiency in the gpkg format.
More broadly, big fan of having the option to adjust the gdal params internally. Not immediately an option we'd use much, but very useful when necessary :)
As it turns out, we already have exposed configuration options in GeoDMS. Internally these options will be applied to the local GDAL thread. There was, however, no wiki entry of GDAL_ConfigurationOptions. Now there is, see GDAL_ConfigurationOptions.
I can confirm that the following works:
container write
{
unit<uint32> optionSet := range(uint32, 0, 3);
attribute<string> GDAL_ConfigurationOptions (optionSet) : [ "OGR_SQLITE_JOURNAL=OFF", "OGR_SQLITE_CACHE=1024", "OGR_SQLITE_SYNCHRONOUS=OFF" ];
container write_large_gpkg := read_large_gdb
, StorageName = "storage_name.gpkg"
, StorageType = "gdalwrite.vect"
, StorageReadOnly = "False"
{
Unit<UInt32> mn := select_with_attr_by_cond(read_large_gdb/mn, const(true, read_large_gdb/mn));
Unit<UInt32> nw := select_with_attr_by_cond(read_large_gdb/nw, const(true, read_large_gdb/nw));
Unit<UInt32> rr := select_with_attr_by_cond(read_large_gdb/rr, const(true, read_large_gdb/rr));
Unit<UInt32> pd := select_with_attr_by_cond(read_large_gdb/pd, const(true, read_large_gdb/pd));
Unit<UInt32> mnpoi_pi := select_with_attr_by_cond(read_large_gdb/mnpoi_pi, const(true, read_large_gdb/mnpoi_pi));
Unit<UInt32> mn_pi := select_with_attr_by_cond(read_large_gdb/mn_pi, const(true, read_large_gdb/mn_pi));
Unit<UInt32> hsnp := select_with_attr_by_cond(read_large_gdb/hsnp, const(true, read_large_gdb/hsnp));
Unit<UInt32> mp := select_with_attr_by_cond(read_large_gdb/mp, const(true, read_large_gdb/mp));
Unit<UInt32> si := select_with_attr_by_cond(read_large_gdb/si, const(true, read_large_gdb/si));
Unit<UInt32> sp := select_with_attr_by_cond(read_large_gdb/sp, const(true, read_large_gdb/sp));
Unit<UInt32> rs := select_with_attr_by_cond(read_large_gdb/rs, const(true, read_large_gdb/rs));
Unit<UInt32> hspr := select_with_attr_by_cond(read_large_gdb/hspr, const(true, read_large_gdb/hspr));
Unit<UInt32> jc := select_with_attr_by_cond(read_large_gdb/jc, const(true, read_large_gdb/jc));
}
}
@eoudejans , thanks! This is going to be very useful!
In the meantime I have fallen back to mostly using pre-existing geopackages that Jip created previously. I managed to create a fss file with results from the hsnp table and link that with those geopackages. Currently creating a network with congested travel times.
Update: hsnp data file with speed profiles is downloadable here: https://surfdrive.surf.nl/files/index.php/s/9jwvEINIE19oXv6
Network is updated, new definitions committed and pushed in git (congestion_dgregio branch)
Code die ik gebruik voor extractie profiles:
container load_themes_fgdb
: StorageName = "%sourcedataprojdir%/in_gdb/TomTom_Compiled_Themes_2021.gdb"
, StorageType = "gdal.vect"
, StorageReadOnly = "True"
, SyncMode = "AllTables" {
unit<uint32> nw {
attribute<WGS84> Geometry (arc);
attribute<hsnp> hsnp_id:= invert(hsnp/n_id);
attribute<float64> f_jnctid;
attribute<float64> t_jnctid;
attribute<float32> kph;
attribute<float32> minutes;
attribute<float32> minutes_congested_0830:= minutes / makeDefined(hsnp/factor_weekday_0830[hsnp_id], 1f);
attribute<float32> minutes_freeflow_0830:= minutes / makeDefined(hsnp/factor_freeflow_0830[hsnp_id], 1f);
}
unit<uint32> jc {
attribute<WGS84> Geometry;
}
unit<uint32> hsnp {
attribute<float64> Network_id;
attribute<nw> n_id:= rlookup(Network_id, nw/id);
attribute<uint16> spweekday;
attribute<uint16> spfreeflow;
attribute<float32> factor_weekday_0830:= rjoin(point(uint32(spweekday), const(30600,.,uint32), upoint), hspr/unipt, hspr/rel_sp / 100f);
attribute<float32> factor_freeflow_0830:= rjoin(point(uint32(spfreeflow), const(30600,.,uint32), upoint), hspr/unipt, hspr/rel_sp / 100f);
}
unit<uint32> hspr {
attribute<uint16> profile_id;
attribute<int32> time_slot;
attribute<float32> rel_sp;
attribute<int32> hour:= time_slot / (60i * 60i);
attribute<int32> minutes:= (time_slot - (hour * 60i * 60i)) / 60i;
attribute<upoint> unipt:= point(uint32(profile_id), uint32(time_slot), upoint);
}
}
Congested reistijden obv speedprofiles in tom tom database om 8:30 op een normale werkdag. Nu aan het uitzoeken hoe de zgn streets_dailyprofiles te koppelen zijn aan het netwerk. In streets_dailyprofiles zit een network_id, maar dat lijkt hele andere nummers te bevatten dan de ids van /load_fgdb/Streets/id of /load_themes_fgdb/nw/id