Closed timotheecour closed 1 year ago
The lexer can handle different sources (strings, streams) with the SourceProvider
concept. To avoid exporting the actual source implementation via generic parameter, the source is stored as pointer and unchecked casts are used to retrieve it. The Lexer stores function pointers to the functions doing the correct casts for the current source.
Since the VM cannot cast, this code needs to be refactored in some way. The idea of SourceProvider
was to avoid dispatching calls for retrieving each character as this will slow down the Lexer. So a good solution will not employ methods.
However, there is no guarantee that the rest of the code will work at compile-time.
@sealmove There are two other issues related to the current state of the lexer, #85 (also some more information available there) and several failures to comply to the YAML specification which are not documented as GitHub issue but available in this list of skipped tests (the tests come from the official YAML test suite).
The best way to go ahead is to rewrite the Lexer properly. I basically know how to do it since I wrote AdaYaml after NimYAML and avoided all these issues there. Since this and the other issue seem to be important, I will try and accomplish a rewrite till the end of the month, to avoid putting more effort into code that cannot be fixed to solve all its issues without rewriting it (a change is needed in how the input is tokenized).
Would this work for you? I can't really guarantee that a Lexer rewrite will completely fix this issue but you can take it up from there.
@flyx Sorry I didn't notice your reply until now. This is awesome news!
Would this work for you
Yes, absolutely. Please notify me when the new lexer is ready.
@sealmove Status update:
The rewrite is mostly working (as seen in the branch lexer-rewrite
) however the lexer & parser APIs changed and I need to adjust the serialization code (the high level API, i.e. load
, dump
etc, will be unchanged).
Compile time execution is currently blocked by the global usage of serializationTagLibrary
in some places. I will probably go forward and remove the type TagId
, instead returning tags directly as ref string
or something – this will remove the need for those global references.
A quick downgrading edit shows that NimYAML indeed works at compiletime when this obstacle is removed.
Work on the branch should be finished soon™.
The lexer rewrite has now landed in the devel
branch.
The following code using NimYAML at compile time now compiles and yields the expected result:
import yaml
type
Person = object
firstnamechar: char
surname: string
age: int32
setTagUri(Person, "!person")
const persons = loadAs[seq[Person]]("""
- {firstnamechar: K, surname: Koch, age: 23}
- {firstnamechar: S, surname: McDuck, age: 150}""")
for p in persons:
echo p
However, latest Nim head is required for this to work. Nim 1.4.0 fails to do it because `==`
on proc
variables appears to be broken there.
Currently not working are ref
values of any kind. The reason for this is that those values are being cast to pointer
when loading, which is used for storing them as target for aliases. I expect that it is possible to support ref
values at compile time by disabling this feature, so that you cannot load cyclic structures. I currently do not plan to implement this as it seems to be a rather exotic feature – if this is important for you, please open a separate issue for it.
I invite everyone taking interest in this issue to test their use-case with NimYAML's current devel
branch and give feedback if something is not working as expected. I don't plan to thoroughly test this myself as I don't expect discrepancies from runtime operation seeing that the simple example given above works.
@flyx this snippet gives: NimYAML/yaml/serialization.nim(407, 18) Error: attempting to call undeclared routine: 'getGMTime'
EDIT: looks like i was on master, not devel; why is the main branch called devel instead of master? this goes against virtually all github repos and nimble packages (with the only exception of nim repo itself) ?
(this aside, this is great news, thanks!)
Currently not working are ref values of any kind. The reason for this is that those values are being cast to pointer when loading, which is used for storing them as target for aliases
I don't understand this part; it prevents a lot of usage at CT even with https://github.com/nim-lang/Nim/pull/15528; why are pointer needed for that?
why is the main branch called devel instead of master
The master branch is for releases; a checkout of master will give you the latest release. This is common workflow for larger projects although I admit that most single-dev repos don't use it; it's just my preferred workflow. It should not really matter since devel
is the HEAD branch and therefore will be the one checked out when you clone NimYAML or install it with nimble's yaml#head
.
I don't understand this part; it prevents a lot of usage at CT even with nim-lang/Nim#15528; why are pointer needed for that?
To resolve aliases. For example:
import yaml
type
Node = ref object
name: string
children: seq[Node]
let input = """
name: root
children:
- &a
name: A
children: []
- name: B
children: [*a]
"""
let root = loadAs[Node](input)
echo root.children[0].name # will be A
echo root.children[1].children[0].name # will be the same A
NimYAML will create multiple references to ref
values from aliases in the input, as seen here. To facilitate that, it must keep a hashmap of all objects created from YAML nodes with an anchor. Since anchored nodes could be of multiple different types, they are cast to pointer
(only ref
values are allowed to have an anchor so this always works) for storing them in the hashmap. A type identifier is stored along with the pointer to ensure that the pointer is only used with the correct type.
I am not sure what the alternative to this would be. I guess it would be possible to have one hashmap per type but that would involve lots of macro magic to generate the necessary amount of hashmaps based on the type we deserialize to (it would need to collect all ref
types that are part of the root type). I would rather not add that much complexity as maintaining this library is already hard enough as it is.
In OOP languages like Java I would use the base Object
type for this but in Nim not every type derives from RootObj
and for all I know, upcasting/downcasting is not possible to the extend needed for this.
Perhaps anchors could simply be forbidden during compile time, but I'm not entirely sure how to do that. Is there some when defined(compiletime)
? I wonder whether this would help anyone, as using ref
types kind-of implies that you want to be able to reference an object multiple times.
I am no active Nim user apart from maintaining this library so there may be options I am not familiar with. If you know one, please tell me.
At least nimyaml should allow to deal with ref types that don't involve multiple references to the same object, that's a very common use case. For multiple references, I don't see why it wouldn't work in VM, but at least we can deal with that later (note, a difficulty would be if user wants to convert a VM value involving multiple references to same object to a RT value, as https://github.com/nim-lang/Nim/pull/15528 currently doesn't handle those correctly but that's a separate issue that can be dealt with later)
Is there some when defined(compiletime)
when nimvm
as mentioned above (see usage restriction in docs)
I wonder whether this would help anyone, as using ref types kind-of implies that you want to be able to reference an object multiple times
I disagree with that; ref could be in the type declaration (eg to allow multiple references at run time) but not in the serialization string (ie no multiple references to same object mentioned in serialization string); it's common
To facilitate that, it must keep a hashmap of all objects created from YAML nodes with an anchor Since anchored nodes could be of multiple different types, they are cast to pointer
why don't you use int
instead of pointer? eg:
when true:
type Foo = ref object
x: int
proc main()=
let f1 = Foo(x: 10)
let b1 = cast[int](f1)
static: main()
main()
the address (via cast[int]
) should be enough to tell whether an object is already in HashMap.
EDIT: maybe the problem is you need the reverse direction as well, ie: cast[Foo](b1)
where b1: int
? if so, that doesn't work yet, but I can try to make a PR in nim to support that (I had previously added https://github.com/nim-lang/Nim/pull/12877 but it doesnt' support int => ref
)
why don't you use int instead of pointer?
pointer
seemed to be the more fitting type. Ultimately the only thing that matters is that the target type has the correct size, so if using int
avoids any problems I'm unaware of, I can use that instead.
maybe the problem is you need the reverse direction as well
Yes, without the reverse cast, it would not be possible to put the value together.
but I can try to make a PR in nim to support that
That would certainly be the best solution, compared with when nimvm
which would just be a functionality-degrading workaround. I'd be grateful if you could do that.
when true:
import yaml
type
Person = ref object
firstnamechar: char
surname: string
age: int32
proc `$`(a: Person): string = $a[]
setTagUri(Person, "!person")
proc fn =
let persons = loadAs[seq[Person]]("""
- {firstnamechar: K, surname: Koch, age: 23}
- {firstnamechar: S, surname: McDuck, age: 150}""")
echo persons
static: fn()
fn()
CT: @[(firstnamechar: 'K', surname: "Koch", age: 23), (firstnamechar: 'S', surname: "McDuck", age: 150)] RT: @[(firstnamechar: 'K', surname: "Koch", age: 23), (firstnamechar: 'S', surname: "McDuck", age: 150)]
const persons = loadAs[seq[Person]]...
should work tooGreat work! Probably better test code is the one with nodes which actually uses anchors & aliases – the code you posted only assures the cast compiles but never uses it:
import yaml
type
Node = ref object
name: string
children: seq[Node]
static:
let input = """
name: root
children:
- &a
name: A
children: []
- name: B
children: [*a]
"""
let root = loadAs[Node](input)
echo root.children[0].name
echo root.children[1].children[0].name
can we re-open this issue until support at CT is reasonable (doesn't have to be complete for that)?
your snippet doesn't work, but neither does the following that doesn't contain anchors:
when true:
import yaml
type
Node = ref object
name: string
children: seq[Node]
proc main =
let input = """
name: root
children:
- name: A
children: []
"""
let root = loadAs[Node](input)
echo root.children[0].name
static: main()
main()
gives:
/Users/timothee/git_clone/nim/timn/tests/nim/all/t11261.nim(125, 15) t11261
/Users/timothee/git_clone/nim/timn/tests/nim/all/t11261.nim(123, 28) main
/Users/timothee/git_clone/nim/NimYAML/yaml/serialization.nim(1382, 7) loadAs
/Users/timothee/git_clone/nim/NimYAML/yaml/serialization.nim(1365, 14) load
/Users/timothee/git_clone/nim/NimYAML/yaml/serialization.nim(1344, 5) construct
/Users/timothee/git_clone/nim/timn/tests/nim/all/t11261.nim(6, 1) template/generic instantiation from here
/Users/timothee/git_clone/nim/NimYAML/yaml/serialization.nim(1344, 5) Error: unhandled exception: Expected field name, got yamlStartMap [YamlConstructionError]
raise (ref YamlConstructionError)(getCurrentException())
with https://github.com/flyx/NimYAML/pull/89 it gives: with -d:yamlDebug:
yamlDebug: parser: push atStreamStart, indent = -2
yamlDebug: parser: transition atStreamEnd
yamlDebug: parser: push beforeDoc, indent = -1
yamlDebug: lexer -> [1,1-1,3] Indentation
yamlDebug: parser: transition beforeDocEnd
yamlDebug: parser: push beforeImplicitRoot, indent = -1
yamlDebug: parser: update indent = -1
yamlDebug: lexer -> [1,3-1,7] Plain
yamlDebug: parser: transition atBlockIndentationProps
yamlDebug: parser: update indent = 2
yamlDebug: lexer -> [1,7-1,8] MapValueInd
yamlDebug: parser: transition afterImplicitKey
yamlDebug: parser: push emitCached
yamlDebug: emitCollection key: pos = 0, len = 1
yamlDebug: parser: pop
upon further investigating it looks like yamlStartMap
is being produced twice, hence the error; but I'm lacking context of the inner workings of your library to figure out the problem
edit: see https://github.com/flyx/NimYAML/issues/91 for further analysis on this particular issue
it's really hard to debug this (requires knowledge of the inner workings of your library), what would really help is if you could minimize this down to something that doesn't use any yaml code, ie pointing to a nim bug (ie: works at RT, not at CT).
another issue is that nimyaml code keeps reusing the following pattern which is bad:
try: complexCode except: raise someException
this is bad because all the context is lost. At least you should populate the parent exception
except Exception:
is bad (eg, in proc next*(s: YamlStream)
) ; it means it'll catch even defects like AssertionDefect
@flyx
Probably better test code is the one with nodes which actually uses anchors & aliases
I've improved the nim PR and now cyclic data (nimYAML anchors) do work, see https://github.com/flyx/NimYAML/issues/92
the above mentioned problem remains (also referenced in https://github.com/flyx/NimYAML/issues/91)
Fair enough. I'll reopen the issue.
The error in your code is unrelated to ref
types. Here's an edit that only calls the parser:
when true:
import yaml
proc main =
let input = """
name: root
children:
- name: A
children: []
"""
let p = initYamlParser(serializationTagLibrary)
let s = p.parse(input)
try:
while true:
let e = s.next()
echo e
if e.kind == yamlEndStream: break
except:
echo repr(getCurrentException())
static: main()
main()
This runs into an exception at compile time but works at runtime. I'll investigate to pinpoint what goes wrong.
This runs into an exception at compile time
can you show it?
your snippet works for me, no exception raised, but prints different things at RT and CT:
fix in opcCastIntToPtr (fixes flyx/NimYAML#92)
at CT: +STR +DOC +MAP +MAP =VAL :root =VAL :children +SEQ +MAP +MAP =VAL :A =VAL :children +SEQ -SEQ -MAP -SEQ -MAP -DOC -STR
at RT: +STR +DOC +MAP =VAL :name =VAL :root =VAL :children +SEQ +MAP =VAL :name =VAL :A =VAL :children +SEQ -SEQ -MAP -SEQ -MAP -DOC -STR
With Nim 2.0.0 and current devel, RT and CT both work for the snippet I posted above. That leaves the ref-to-ptr-cast error the only known remaining issue for CT.
after applying the patch from https://github.com/flyx/NimYAML/issues/69#issuecomment-502444945 to make it work with latest nim 0.20 (which does work!), I'm trying to make it work at compile time; currently it gives compiler errors for the basic examples in https://nimyaml.org/
newFileStream doesn't work at CT so I tried this as well:
which gives several errors eg:
Has anyone succeeded in making it work at CT (even with a more limited set of yaml features would be a start)?