PMunch / futhark

Automatic wrapping of C headers in Nim
MIT License
355 stars 19 forks source link

Better `sanitizeName` #87

Open arkanoid87 opened 7 months ago

arkanoid87 commented 7 months ago

At the moment, Futhark applies renameCallback before other fixed renaming rules, including nimIdentNormalize pass, which nullify all the NEP1 transformations user supplied renameCallback apply.

I'd say nimIdentNormalize is not really NEP1 friendly

I've recently pushed PR to anycase package to support compile time string transformations https://github.com/epszaw/anycase/issues/6

What about replacing nimIdentNormalize in sanitizeName with something that is more renameCallback and NEP1 friendly? Or maybe provide an additional renameTailCallback to fix things after the fixed transformations?

arkanoid87 commented 7 months ago

I tested my idea replacing nimIdentNormalize call in futhark.nim and GDAL library, and it seems to be working nicely

changes to futhark.nim sanitizeName proc

import anycase

...

proc sanitizeName(usedNames: var HashSet[string], origName: string, kind: string, renameCallback: RenameCallback, partof = ""): string {.compileTime.} =
  result = origName
  if not renameCallback.isNil:
    result = result.renameCallback(kind, partof)
  if result.startsWith("_"):
    if result.startsWith("__"):
      result = "compiler_" & result[2..^1]
    else:
      result = "internal_" & result[1..^1]

  # result = result.nimIdentNormalize()
  result = case kind:
    of "enum", "struct", "union", "const", "typedef": result.pascal # AnyCase
    of "proc", "arg", "enumval", "field": result.camel # anyCase
    else: result.nimIdentNormalize() # [A|a]nycase

  var renamed = false
  if usedNames.contains(result) or result in builtins:
    result.add kind
    renamed = true
  if usedNames.contains(result) or result in builtins:
    result.add hash(origName).toHex
    renamed = true
  if renamed:
    hint "Renaming \"" & origName & "\" to \"" & result & "\"" & (if partof.len != 0: " in " & partof else: "")
  usedNames.incl result

Results for GDAL library. The dirty part is still under user control via renameCallback

import std/[sugar, os, strutils]
import futhark

func renameCb(n, k: string, p = ""): string =
  result = n
  for prefix in ["GDAL_", "OGR_"]:
    if result.startsWith prefix:
      result = result.replace(prefix, "")
      break
  for prefix in ["GDAL", "OGR", "OCT", "OSR", "CPL", "VSI"]:
    if result.startsWith prefix:
      result = result.replace(prefix, prefix.toLower)
      break
  result = result.replace("_", "")

importc:
  outputPath currentSourcePath.parentDir / "gdal_c.nim"
  renameCallback renameCb
  "gdal/gdal.h"
  "gdal/ogr_api.h"
  "gdal/ogr_srs_api.h"
  "gdal/cpl_conv.h"
  "gdal/cpl_vsi.h"

proc main =
  var major, minor, patch: cint
  assert ogrGetGEOSVersion(major.addr, minor.addr, patch.addr)
  echo "GEOS version: ", major, ".", minor, ".", patch
  osrGetPROJVersion(major.addr, minor.addr, patch.addr)
  echo "PROJ version: ", major, ".", minor, ".", patch

  gdalAllRegister()
  let spatialReference = osrNewSpatialReference(nil)
  assert spatialReference.osrImportFromEPSG(4326) == 0

  var geometry: OgrGeometryH
  var wkt = "POINT(1 2)".cstring
  assert gCreateFromWkt(wkt.addr, spatialReference, geometry.addr) == 0

  dump geometry.gGetGeometryType()
  dump geometry.gGetGeometryName()
  let json = geometry.gExportToJson()
  dump json
  vsiFree(json)

  gDestroyGeometry(geometry)
  osrDestroySpatialReference(spatialReference)

main()

generated gdal_c.zip and rename hints

Hint: Renaming "__time_t" to "Timettypedef" [User]
Hint: Renaming "ptr" to "ptrarg" [User]
Hint: Renaming "is" to "isarg" [User]
Hint: Renaming "ptr" to "ptrarg" [User]
Hint: Renaming "ptr" to "ptrarg" [User]

output

GEOS version: 3.10.2
PROJ version: 8.2.1
geometry.gGetGeometryType() = wkbPoint
geometry.gGetGeometryName() = POINT
json = { "type": "Point", "coordinates": [ 2.0, 1.0 ] }
PMunch commented 7 months ago

This code has a slight issue, it won't properly handle name collisions. For anyone who didn't follow the IRC discussion when this came up the reason why sanitizeName calls nimIdentNormalize is so that the hash table lookup of names is guaranteed to get identifiers which will collide in Nim (but which might not collide in C because it's case and underscore sensitive). The proper workaround for this is to still use the normalized name in these data-structures but change to storing the original (or anycased) names and normalizing them when doing lookups.

As a motivating example I wrote a test case in the normalize branch which shows off the kind of thing the current solution was meant to handle. It tries to wrap this perfectly valid (albeit a bit extreme) C header file:

#define my_var 1
#define myVar 2
#define myvar 3
#define MYVAR 4
#define MY_VAR 5
#define MyVar 6
#define My_Var 7

The current version wraps this fine and renames fields to avoid collisions as we can see in the output:

/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "myVar" to "myvarconst" [User]
/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "myvar" to "myvarconst00000000C690172C" [User]
/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "MY_VAR" to "Myvarconst" [User]
/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "MyVar" to "Myvarconst000000002E4AA817" [User]
/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "My_Var" to "Myvarconst00000000038D4D97" [User]

The names get appended some information before falling back to a hash of the original identifier.

With your version of sanitizeName we instead see this output:

/home/peter/Projects/futhark/src/futhark.nim(197, 5) Hint: Renaming "myVar" to "MyVarconst" [User]
/home/peter/Projects/futhark/src/futhark.nim(197, 5) Hint: Renaming "MY_VAR" to "MyVarconst00000000AEC77163" [User]
/home/peter/Projects/futhark/src/futhark.nim(197, 5) Hint: Renaming "MyVar" to "MyVarconst000000002E4AA817" [User]
/home/peter/Projects/futhark/src/futhark.nim(197, 5) Hint: Renaming "My_Var" to "MyVarconst00000000038D4D97" [User]
/home/peter/Projects/futhark/src/futhark.nim(908, 3) Hint: Caching Futhark output in /home/peter/.cache/nim/tnormalize_d/futhark_33DDD4433451E496.nim [User]
/home/peter/Projects/futhark/src/futhark.nim(141, 21) Hint: Declaration of Myvar already exists, not redeclaring [User]
/home/peter/Projects/futhark/src/futhark.nim(141, 21) Hint: Declaration of MYVAR already exists, not redeclaring [User]

As we can see it doesn't properly rename all the identifiers, and some of them are aliased together in the wrapper making #define myvar 3 and #define MYVAR 4 now point to #define my_var 1. This means that these values are not possible to get with the wrapper, and with --nodeclguards the code fails to compile at all because of the re-definition.