Closed kwladyka closed 1 year ago
I have a large file, loaded as
(def all-of-it (yaml/parse-stream (javaio/reader "a-file.yaml") :load-all true))
Curiously, when I evaluate it in a REPL, this one raises the exception (map identity all-of-it)
, but this one does not (take 1000000 all-of-it)
does not (the limit is larger than the number of documents in the input file).
@PetrGlad Can you wrap that take in a doall
?
Sorry, it looks like a reproduction includes to attempt an operation on the sequence first, then other operations succeed. Like
(def all-of-it (yaml/parse-stream (javaio/reader "a-file.yaml") :load-all true))
(doall (map identity all-cases)) ; <-- FAILS
(doall (map identity all-cases)) ; <-- OK
It seems it does not matter which operation was tried first. These are evaluated in REPL, so I think doall
should not change the behavior.
If anyone wants to do a PR, we're open to that. It should be relatively straightforward to add:
Just wanted to note that the actual problem (in my case) is in the snakeyaml. I have already reported that. Snakeyaml have enforced the input size limit, but it actually limits the whole input stream size, while it only makes sense to limit document size instead. For example this makes difference when input stream contains many small documents. Making the limit configurable would be a workaround, nonetheless.
Thanks for following up @PetrGlad!
Just wanted to note that the actual problem (in my case) is in the snakeyaml. I have already reported that.
Was it this issue here?
Snakeyaml have enforced the input size limit, but it actually limits the whole input stream size, while it only makes sense to limit document size instead. For example this makes difference when input stream contains many small documents. Making the limit configurable would be a workaround, nonetheless.
Is there a separate SnakeYAML issue to address this too?
Yes, that was the change. I sent a message to google groups because other services were locked down due to attacks. They admitted that it is likely a problem but I do not know if a ticket was created (here).
@PetrGlad, I don't see a SnakeYAML issue created for that either. I think SnakeYAML issues on Bitbucket might be still a bit wonky. We can see them now, but maybe not create new issues yet. A friendly reply/reminder on your thread in the SnakeYAML mailing list would probably be helpful to Andrey.
I pinged Andrey and he responded:
It was fixed without the ticket. Feel free to create one - we can check how it works (it should be pre-moderated now)
https://bitbucket.org/snakeyaml/snakeyaml/wiki/Changes
Andrey
@PetrGlad, FYI: because Andrey asked me to, in the spirit of being a good citizen, I went ahead and created a SnakeYAML ticket with repro.
When read large YAML
the
code_point_limit
is needed to overwrite, but I didn’t find a way to do this with clj-yaml.How do you read large YAML files?
From slack #clj-yaml