ObjectVision / GeoDMS

Source code for the GeoDMS software
https://www.geodms.nl/
Other
7 stars 1 forks source link

inconsistent unicode behavior #144

Open jan-perl opened 1 year ago

jan-perl commented 1 year ago

trying to match gemeentenaam in https://github.com/jan-perl/EPcheck/blob/main/cfg/EPcheck.dms I see inconsistent behavior in unicode. SourceData/gemfrompc4/Gemeentenaam is an fss imported from a csv file SourceData/CBS_2022/gemeenten/Gemeentenaam comes from the example https://github.com/ObjectVision/GeoDMS/wiki/CBS-gemeente-wijk-buurt-kaart

the rlookup match fails for Súdwest-Fryslân and in the table view, it is sometimed displayed correctly (mostly from the csv/fss), and sometimes not Displayed as: Súdwest-Fryslân

I assume that unicode implementation is not yet consistent.

        unit<uint32> gemeentekoppchkpc := unique(SourceData/gemfrompc4/Gemeentenaam ) 
        ,    StorageName = "%ProjDir%/output/kchk-pc.dbf" 
        {
            attribute<string>     gemeentenaamd   := string( Values);     
            attribute<uint32>     gem_geo_idx := rlookup(gemeentenaamd, SourceData/CBS_2022/gemeenten/Gemeentenaam);
            attribute<Geography/rdc> geometry (polygon) := SourceData/CBS_2022/Gemeenten/geometry[gem_geo_idx];
            attribute<ratio>  test         := float32(rnd_uniform(0, ., range(float32, 0f, 1f)));
        }

        unit<uint32> gemeentekoppchkcbs := unique(SourceData/CBS_2022/gemeenten/Gemeentenaam ) 
        ,    StorageName = "%ProjDir%/output/kchk-cbs.dbf" 
        {
            attribute<string>     gemeentenaamd   := string( Values);     
            attribute<uint32>     gem_pc4_idx := rlookup(gemeentenaamd, SourceData/gemfrompc4/Gemeentenaam);
            attribute<uint32>     gem_geo_idx := rlookup(gemeentenaamd, SourceData/CBS_2022/gemeenten/Gemeentenaam);
            attribute<Geography/rdc> geometry (polygon) := SourceData/CBS_2022/Gemeenten/geometry[gem_geo_idx];
            attribute<ratio>  test         := float32(rnd_uniform(0, ., range(float32, 0f, 1f)));
        }
mtbeek32 commented 1 year ago

Can you check if the characterset while working with the .csv file in a text editor, e.g. Notepad ++, is set to UTF-8

jan-perl commented 1 year ago

I re-created the table in code from buurten in https://github.com/ObjectVision/GeoDMS/wiki/CBS-gemeente-wijk-buurt-kaart see new definition of gemfrompc4 in https://github.com/jan-perl/EPcheck/blob/main/cfg/EPcheck.dms

the problem with rlookup persists, even though both tables are now read from the buurten en wijken dbf.

MaartenHilferink commented 1 year ago

@Martin: hoe worden bij jou dergelijke UTF8 namen uit een .dbf gelezen en in rlookup verwerkt ?

eoudejans commented 6 months ago

@jan-perl Is this still an issue? And if yes could you post a screenshot of the misspelling?