OSGeo / grass

GRASS GIS - free and open-source geospatial processing engine
https://grass.osgeo.org
Other
824 stars 302 forks source link

[Bug] db.univar: fails on Windows due to unix style sort being used #3778

Open neteler opened 3 months ago

neteler commented 3 months ago

v.db.univar fails because db.univar fails. ... on windows, there is a sort executable, that doesn't work like the Unix one. So, on Windows, it doesn't fall back into the Python implementation.

Originally posted by @echoix in https://github.com/OSGeo/grass/issues/3743#issuecomment-2143947719

we need a different implementation for

https://github.com/OSGeo/grass/blob/f59851043fbd7cb52288b7000d7e76824ae63ab7/scripts/db.univar/db.univar.py#L83

wenzeslaus commented 3 months ago

Maybe just add not sys.platform.startswith('win') and.

echoix commented 3 months ago

However, this only occurs if the console is cmd (like the console that launches GRASS from OSGeo4W). I wasn't able to find the sort command through powershell. So maybe if the console was a git-bash or a msys2 env, or a special path of the kind, the Linux-style sort that supports -n can be found.

echoix commented 3 months ago

If it's a fallback implementation, does it makes sense to just try it in a try-catch? Is it too expensive on windows and we get a performance penalty on repeated calls?

wenzeslaus commented 3 months ago

If it's a fallback implementation, does it makes sense to just try it in a try-catch?

But if the sort command is just things other things, won't we potentially spend all time in the subprocess just to get garbage which we can't tell from the correct result?

echoix commented 3 months ago

And why do we need to use Linux only sort command already? What makes it special?

echoix commented 3 months ago

If it's a fallback implementation, does it makes sense to just try it in a try-catch?

But if the sort command is just things other things, won't we potentially spend all time in the subprocess just to get garbage which we can't tell from the correct result?

I see you point. The file not found is when the Windows sort.exe tries to find a file named -n and cannot find it. We wouldn't have the error if we placed a file named -n.

wenzeslaus commented 3 months ago

We wouldn't have the error if we placed a file named -n.

That would be the biggest hack... I would just come up with an easy fix for the bug like checking sys.platform and move on. The code needs a complete re-evaluation/re-implementation.

echoix commented 3 months ago

What I meant is that we can't identify that sort doesn't support -n only from the file not found error.

The windows sort doesn't seem to have an option that does the same thing as the Linux one.

hellik commented 3 months ago

sort in windos

SORT [/R] [/+n] [/M kilobytes] [/L locale] [/REC recordbytes]

  [[drive1:][path1]filename1] [/T [drive2:][path2]]

  [/O [drive3:][path3]filename3]

  /+n                         Specifies the character number, n, to

                              begin each comparison.  /+3 indicates that

                              each comparison should begin at the 3rd

                              character in each line.  Lines with fewer

                              than n characters collate before other lines.

                              By default comparisons start at the first

                              character in each line.

  /L[OCALE] locale            Overrides the system default locale with

                              the specified one.  The ""C"" locale yields

                              the fastest collating sequence and is

                              currently the only alternative.  The sort

                              is always case insensitive.

  /M[EMORY] kilobytes         Specifies amount of main memory to use for

                              the sort, in kilobytes.  The memory size is

                              always constrained to be a minimum of 160

                              kilobytes.  If the memory size is specified

                              the exact amount will be used for the sort,

                              regardless of how much main memory is

                              available.

                              The best performance is usually achieved by

                              not specifying a memory size.  By default the

                              sort will be done with one pass (no temporary

                              file) if it fits in the default maximum

                              memory size, otherwise the sort will be done

                              in two passes (with the partially sorted data

                              being stored in a temporary file) such that

                              the amounts of memory used for both the sort

                              and merge passes are equal.  The default

                              maximum memory size is 90% of available main

                              memory if both the input and output are

                              files, and 45% of main memory otherwise.

  /REC[ORD_MAXIMUM] Zeichen   Gibt die maximale Anzahl an Zeichen pro

                              Datensatz an (Standard: 4096, maximal 65535).

  /R[EVERSE]                  Dreht die Sortierreihenfolge um, d.h. sortiert

                              von Z bis A, dann von 9 bis 0.

  [Laufwerk1:][Pfad1]Datei1   Gibt die zu sortierende Datei an. Wird diese

                              nicht angegeben, wird der Standardeingang zum

                              Sortieren verwendet. Die Angabe der Datei ist

                              schneller als die Umleitung des Standardeingangs

                              auf diese Datei.

  /T[EMPORARY]

    [Laufwerk2:][Pfad2]       Gibt den Pfad an, unter dem ggf. die temporäre

                              Datei angelegt werden soll. Standardmäßig wird

                              das Temporärverzeichnis des Systems verwendet.

  /O[UTPUT]

    [Laufwerk3:][Pfad3]Datei3 Gibt die Datei an, in der die sortierten Daten

                              gespeichert werden sollen. Wird diese nicht

                              angegeben, wird der Standardausgang verwendet.

                              Die Angabe der Datei ist schneller als die

                              Umleitung des Standardausgangs auf diese Datei.
hellik commented 3 months ago

https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/sort