flyx / NimYAML

YAML implementation for Nim
https://nimyaml.org
Other
191 stars 36 forks source link

how much work needed to make it work at compile time? #70

Closed timotheecour closed 1 year ago

timotheecour commented 5 years ago

after applying the patch from https://github.com/flyx/NimYAML/issues/69#issuecomment-502444945 to make it work with latest nim 0.20 (which does work!), I'm trying to make it work at compile time; currently it gives compiler errors for the basic examples in https://nimyaml.org/

newFileStream doesn't work at CT so I tried this as well:

  let s1 = staticRead(file)
  let s = newStringStream(s1)
  var personList: seq[Person]
  load(s, personList)

which gives several errors eg:

private/lex.nim(1153, 11) Error: cannot generate code for: mGCref

/yaml/private/lex.nim(1157, 37) Error: VM is only allowed to 'cast' between integers and/or floats of same size
      result[] = YamlLexerObj(source: cast[pointer](blSource), inFlow: false,

Has anyone succeeded in making it work at CT (even with a more limited set of yaml features would be a start)?

flyx commented 5 years ago

The lexer can handle different sources (strings, streams) with the SourceProvider concept. To avoid exporting the actual source implementation via generic parameter, the source is stored as pointer and unchecked casts are used to retrieve it. The Lexer stores function pointers to the functions doing the correct casts for the current source.

Since the VM cannot cast, this code needs to be refactored in some way. The idea of SourceProvider was to avoid dispatching calls for retrieving each character as this will slow down the Lexer. So a good solution will not employ methods.

However, there is no guarantee that the rest of the code will work at compile-time.

sealmove commented 4 years ago

This feature is crucial for nimitai so I plan to tackle it soon.

flyx commented 4 years ago

@sealmove There are two other issues related to the current state of the lexer, #85 (also some more information available there) and several failures to comply to the YAML specification which are not documented as GitHub issue but available in this list of skipped tests (the tests come from the official YAML test suite).

The best way to go ahead is to rewrite the Lexer properly. I basically know how to do it since I wrote AdaYaml after NimYAML and avoided all these issues there. Since this and the other issue seem to be important, I will try and accomplish a rewrite till the end of the month, to avoid putting more effort into code that cannot be fixed to solve all its issues without rewriting it (a change is needed in how the input is tokenized).

Would this work for you? I can't really guarantee that a Lexer rewrite will completely fix this issue but you can take it up from there.

sealmove commented 4 years ago

@flyx Sorry I didn't notice your reply until now. This is awesome news!

Would this work for you

Yes, absolutely. Please notify me when the new lexer is ready.

flyx commented 4 years ago

@sealmove Status update:

The rewrite is mostly working (as seen in the branch lexer-rewrite) however the lexer & parser APIs changed and I need to adjust the serialization code (the high level API, i.e. load, dump etc, will be unchanged).

Compile time execution is currently blocked by the global usage of serializationTagLibrary in some places. I will probably go forward and remove the type TagId, instead returning tags directly as ref string or something – this will remove the need for those global references.

A quick downgrading edit shows that NimYAML indeed works at compiletime when this obstacle is removed.

Work on the branch should be finished soon™.

flyx commented 4 years ago

The lexer rewrite has now landed in the devel branch.

The following code using NimYAML at compile time now compiles and yields the expected result:

import yaml

type
  Person = object
    firstnamechar: char
    surname: string
    age: int32

setTagUri(Person, "!person")

const persons = loadAs[seq[Person]]("""
- {firstnamechar: K, surname: Koch, age: 23}
- {firstnamechar: S, surname: McDuck, age: 150}""")

for p in persons:
  echo p

However, latest Nim head is required for this to work. Nim 1.4.0 fails to do it because `==` on proc variables appears to be broken there.

Currently not working are ref values of any kind. The reason for this is that those values are being cast to pointer when loading, which is used for storing them as target for aliases. I expect that it is possible to support ref values at compile time by disabling this feature, so that you cannot load cyclic structures. I currently do not plan to implement this as it seems to be a rather exotic feature – if this is important for you, please open a separate issue for it.

I invite everyone taking interest in this issue to test their use-case with NimYAML's current devel branch and give feedback if something is not working as expected. I don't plan to thoroughly test this myself as I don't expect discrepancies from runtime operation seeing that the simple example given above works.

timotheecour commented 4 years ago

@flyx this snippet gives: NimYAML/yaml/serialization.nim(407, 18) Error: attempting to call undeclared routine: 'getGMTime'

EDIT: looks like i was on master, not devel; why is the main branch called devel instead of master? this goes against virtually all github repos and nimble packages (with the only exception of nim repo itself) ?

(this aside, this is great news, thanks!)

timotheecour commented 4 years ago

Currently not working are ref values of any kind. The reason for this is that those values are being cast to pointer when loading, which is used for storing them as target for aliases

I don't understand this part; it prevents a lot of usage at CT even with https://github.com/nim-lang/Nim/pull/15528; why are pointer needed for that?

flyx commented 4 years ago

why is the main branch called devel instead of master

The master branch is for releases; a checkout of master will give you the latest release. This is common workflow for larger projects although I admit that most single-dev repos don't use it; it's just my preferred workflow. It should not really matter since devel is the HEAD branch and therefore will be the one checked out when you clone NimYAML or install it with nimble's yaml#head.

I don't understand this part; it prevents a lot of usage at CT even with nim-lang/Nim#15528; why are pointer needed for that?

To resolve aliases. For example:

import yaml

type
  Node = ref object
    name: string
    children: seq[Node]

let input = """
name: root
children:
- &a
  name: A
  children: []
- name: B 
  children: [*a]
"""

let root = loadAs[Node](input)

echo root.children[0].name             # will be A
echo root.children[1].children[0].name # will be the same A

NimYAML will create multiple references to ref values from aliases in the input, as seen here. To facilitate that, it must keep a hashmap of all objects created from YAML nodes with an anchor. Since anchored nodes could be of multiple different types, they are cast to pointer (only ref values are allowed to have an anchor so this always works) for storing them in the hashmap. A type identifier is stored along with the pointer to ensure that the pointer is only used with the correct type.

I am not sure what the alternative to this would be. I guess it would be possible to have one hashmap per type but that would involve lots of macro magic to generate the necessary amount of hashmaps based on the type we deserialize to (it would need to collect all ref types that are part of the root type). I would rather not add that much complexity as maintaining this library is already hard enough as it is.

In OOP languages like Java I would use the base Object type for this but in Nim not every type derives from RootObj and for all I know, upcasting/downcasting is not possible to the extend needed for this.

Perhaps anchors could simply be forbidden during compile time, but I'm not entirely sure how to do that. Is there some when defined(compiletime)? I wonder whether this would help anyone, as using ref types kind-of implies that you want to be able to reference an object multiple times.

I am no active Nim user apart from maintaining this library so there may be options I am not familiar with. If you know one, please tell me.

sealmove commented 4 years ago

Thanks for this work!

Is there some when defined(compiletime)

Fyi yes, there is.

timotheecour commented 4 years ago

At least nimyaml should allow to deal with ref types that don't involve multiple references to the same object, that's a very common use case. For multiple references, I don't see why it wouldn't work in VM, but at least we can deal with that later (note, a difficulty would be if user wants to convert a VM value involving multiple references to same object to a RT value, as https://github.com/nim-lang/Nim/pull/15528 currently doesn't handle those correctly but that's a separate issue that can be dealt with later)

Is there some when defined(compiletime)

when nimvm as mentioned above (see usage restriction in docs)

I wonder whether this would help anyone, as using ref types kind-of implies that you want to be able to reference an object multiple times

I disagree with that; ref could be in the type declaration (eg to allow multiple references at run time) but not in the serialization string (ie no multiple references to same object mentioned in serialization string); it's common

To facilitate that, it must keep a hashmap of all objects created from YAML nodes with an anchor Since anchored nodes could be of multiple different types, they are cast to pointer

why don't you use int instead of pointer? eg:

when true:
  type Foo = ref object
    x: int
  proc main()=
    let f1 = Foo(x: 10)
    let b1 = cast[int](f1)
  static: main()
  main()

the address (via cast[int]) should be enough to tell whether an object is already in HashMap.

EDIT: maybe the problem is you need the reverse direction as well, ie: cast[Foo](b1) where b1: int ? if so, that doesn't work yet, but I can try to make a PR in nim to support that (I had previously added https://github.com/nim-lang/Nim/pull/12877 but it doesnt' support int => ref)

flyx commented 4 years ago

why don't you use int instead of pointer?

pointer seemed to be the more fitting type. Ultimately the only thing that matters is that the target type has the correct size, so if using int avoids any problems I'm unaware of, I can use that instead.

maybe the problem is you need the reverse direction as well

Yes, without the reverse cast, it would not be possible to put the value together.

but I can try to make a PR in nim to support that

That would certainly be the best solution, compared with when nimvm which would just be a functionality-degrading workaround. I'd be grateful if you could do that.

timotheecour commented 4 years ago
when true:
  import yaml
  type
    Person = ref object
      firstnamechar: char
      surname: string
      age: int32
  proc `$`(a: Person): string = $a[]
  setTagUri(Person, "!person")
  proc fn =
    let persons = loadAs[seq[Person]]("""
      - {firstnamechar: K, surname: Koch, age: 23}
      - {firstnamechar: S, surname: McDuck, age: 150}""")
    echo persons
  static: fn()
  fn()

CT: @[(firstnamechar: 'K', surname: "Koch", age: 23), (firstnamechar: 'S', surname: "McDuck", age: 150)] RT: @[(firstnamechar: 'K', surname: "Koch", age: 23), (firstnamechar: 'S', surname: "McDuck", age: 150)]

flyx commented 4 years ago

Great work! Probably better test code is the one with nodes which actually uses anchors & aliases – the code you posted only assures the cast compiles but never uses it:

import yaml

type
  Node = ref object
    name: string
    children: seq[Node]

static:
  let input = """
name: root
children:
- &a
  name: A
  children: []
- name: B
  children: [*a]
"""

  let root = loadAs[Node](input)

  echo root.children[0].name
  echo root.children[1].children[0].name
timotheecour commented 4 years ago

can we re-open this issue until support at CT is reasonable (doesn't have to be complete for that)?

your snippet doesn't work, but neither does the following that doesn't contain anchors:

when true:
  import yaml
  type
    Node = ref object
      name: string
      children: seq[Node]

  proc main =
    let input = """
  name: root
  children:
  - name: A
    children: []
  """
    let root = loadAs[Node](input)
    echo root.children[0].name
  static: main()
  main()

gives:

/Users/timothee/git_clone/nim/timn/tests/nim/all/t11261.nim(125, 15) t11261
/Users/timothee/git_clone/nim/timn/tests/nim/all/t11261.nim(123, 28) main
/Users/timothee/git_clone/nim/NimYAML/yaml/serialization.nim(1382, 7) loadAs
/Users/timothee/git_clone/nim/NimYAML/yaml/serialization.nim(1365, 14) load
/Users/timothee/git_clone/nim/NimYAML/yaml/serialization.nim(1344, 5) construct
/Users/timothee/git_clone/nim/timn/tests/nim/all/t11261.nim(6, 1) template/generic instantiation from here
/Users/timothee/git_clone/nim/NimYAML/yaml/serialization.nim(1344, 5) Error: unhandled exception: Expected field name, got yamlStartMap [YamlConstructionError]
      raise (ref YamlConstructionError)(getCurrentException())

with https://github.com/flyx/NimYAML/pull/89 it gives: with -d:yamlDebug:

yamlDebug: parser: push atStreamStart, indent = -2
yamlDebug: parser: transition atStreamEnd
yamlDebug: parser: push beforeDoc, indent = -1
yamlDebug: lexer -> [1,1-1,3] Indentation
yamlDebug: parser: transition beforeDocEnd
yamlDebug: parser: push beforeImplicitRoot, indent = -1
yamlDebug: parser: update indent = -1
yamlDebug: lexer -> [1,3-1,7] Plain
yamlDebug: parser: transition atBlockIndentationProps
yamlDebug: parser: update indent = 2
yamlDebug: lexer -> [1,7-1,8] MapValueInd
yamlDebug: parser: transition afterImplicitKey
yamlDebug: parser: push emitCached
yamlDebug: emitCollection key: pos = 0, len = 1
yamlDebug: parser: pop

upon further investigating it looks like yamlStartMap is being produced twice, hence the error; but I'm lacking context of the inner workings of your library to figure out the problem

edit: see https://github.com/flyx/NimYAML/issues/91 for further analysis on this particular issue

timotheecour commented 4 years ago

@flyx

Probably better test code is the one with nodes which actually uses anchors & aliases

I've improved the nim PR and now cyclic data (nimYAML anchors) do work, see https://github.com/flyx/NimYAML/issues/92

the above mentioned problem remains (also referenced in https://github.com/flyx/NimYAML/issues/91)

flyx commented 4 years ago

Fair enough. I'll reopen the issue.

The error in your code is unrelated to ref types. Here's an edit that only calls the parser:

when true:
  import yaml

  proc main =
    let input = """
  name: root
  children:
  - name: A
    children: []
  """
    let p = initYamlParser(serializationTagLibrary)
    let s = p.parse(input)
    try:
      while true:
        let e = s.next()
        echo e
        if e.kind == yamlEndStream: break
    except:
      echo repr(getCurrentException())
  static: main()
  main()

This runs into an exception at compile time but works at runtime. I'll investigate to pinpoint what goes wrong.

timotheecour commented 4 years ago

This runs into an exception at compile time

can you show it?

your snippet works for me, no exception raised, but prints different things at RT and CT:

at CT: +STR +DOC +MAP +MAP =VAL :root =VAL :children +SEQ +MAP +MAP =VAL :A =VAL :children +SEQ -SEQ -MAP -SEQ -MAP -DOC -STR

at RT: +STR +DOC +MAP =VAL :name =VAL :root =VAL :children +SEQ +MAP =VAL :name =VAL :A =VAL :children +SEQ -SEQ -MAP -SEQ -MAP -DOC -STR

flyx commented 1 year ago

With Nim 2.0.0 and current devel, RT and CT both work for the snippet I posted above. That leaves the ref-to-ptr-cast error the only known remaining issue for CT.