certik / yaml-cpp

Automatically exported from code.google.com/p/yaml-cpp
MIT License
0 stars 0 forks source link

Parser reads entire stream #148

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Compile the code attached and run it. 

or 
1. emit to a stream.
2. parse the stream.
3. read some but not all of the data in the stream from the parser/node
4. create a new parser from the same stream
5. attempt to read more data using the new parser
6. fail at reading more data as it's been consumed by the first parser

What is the expected output? What do you see instead?
The stream will contain all data not yet read by the parser. Instead, the 
stream is entirely consumed by a parser even if the parser did not read 
(consume) the data. 

What version of the product are you using? On what operating system?
The OS is linux, distribution Fedora, 64 bit arch. 

yaml-cpp version:
Name        : yaml-cpp
Arch        : x86_64
Version     : 0.2.6
Release     : 1.fc16
Size        : 529 k
Repo        : installed
From repo   : fedora
Summary     : A YAML parser and emitter for C++
URL         : http://code.google.com/p/yaml-cpp/

Please provide any additional information below.
The problem happens when you use more than one YAML::Parser on a single stream. 
The first parser given the stream, reads it all so that when the second parser 
gets the stream it's already empty. 

I've attached an example of the problem, but paste it in here for ease of 
reading:

{{{
#include <sstream>
#include <yaml.h>

// 
// Compile (w/g++ on unixes with): 
// g++ $(pkg-config --cflags yaml-cpp) test.cpp $(pkg-config --libs yaml-cpp) 
-o test
//

using namespace std;

int main(void) {

    YAML::Emitter e;
    e << 1;
    e << 2;
    e << 3;
    e << 4;

    // show serialized YAML data.
    cout << "YAML data:" << endl << e.c_str() << endl;
    stringstream ss(e.c_str());

    // now read it back in, one int at a time.
    int read_data = 0;

    // before parsing, the stream's get index == 0. 
    cout << "Pre-parse stream get pointer: " << ss.tellg() << endl;
    YAML::Parser p(ss);
    YAML::Node node;
    p.GetNextDocument(node);

    node >> read_data;          // read_data is 1

    // The stream's get pointer should be at the start of the second
    // int, but it is not. The parser/node has read all the data
    // in the stringstream. So if you use the stream on another Parser, 
    // things break. 

    // The post parse stream's get index should be 4? 
    // Instead it is 19 - the entire stream.
    cout << "Post-parse stream get pointer: " << ss.tellg() << endl;

    // Declare a second parser, using the same stream. 
    YAML::Parser p2(ss);
    p2.GetNextDocument(node);
    node >> read_data;      // yaml-cpp go boom!

    // full output:
    // YAML data:
    // 1
    // --- 
    // 2
    // ---
    // 3
    // ---
    // 4
    // Pre-parse stream get pointer: 0
    // Post-parse stream get pointer: 19
    // terminate called after throwing an instance of 'YAML::InvalidScalar'
    //   what():  yaml-cpp: error at line 1, column 1: invalid scalar
    //   Aborted

    return 0;
} 
}}}

Thanks!

Original issue reported on code.google.com by phil.s.s...@gmail.com on 23 Jan 2012 at 12:08

Attachments:

GoogleCodeExporter commented 9 years ago
You're right, this is the current behavior of the library - you should only use 
one parser on a given stream, and use GetNextDocument to iterate through the 
documents.

That said, it's not a bad idea to have a parser only read one document at a 
time from a stream. I'm not sure when I can get to this, but I'll keep it in 
mind.

Original comment by jbe...@gmail.com on 23 Jan 2012 at 6:15

GoogleCodeExporter commented 9 years ago

Original comment by jbe...@gmail.com on 18 May 2012 at 6:32

GoogleCodeExporter commented 9 years ago

Original comment by jbe...@gmail.com on 19 May 2012 at 9:08

GoogleCodeExporter commented 9 years ago
Issue 160 has been merged into this issue.

Original comment by jbe...@gmail.com on 27 May 2012 at 7:14

GoogleCodeExporter commented 9 years ago
As people use Yaml more and more, you're going to see people running into this 
bug more and more.

The reason is simple - people want to use Yaml to communicate, between 
processes or between network node.  One of my applications uses socket 
connections to send packets of Yaml information up and down the wire.  In 
another, a Python application sends commands to a C++ subprocess as packets of 
Yaml data.

Yaml was clearly designed for streaming - the existence of the --- and ... 
lines should make that clear, but if you try to read the entire "document" 
before it starts it can't stream!

An interesting note is that at least one other Yaml parser, pyyaml/libyaml, has 
a different but related issue that also causes trouble.  It doesn't actually 
greedily suck down the entirely stream, but what it does do is not to write the 
separating --- string until the *next* packet appears.  This means of course 
that the last packet in any stream hangs up the parser, because it never sees 
the ---...

All their work seems to be thinking of parsing documents, too, but these days, 
I have fewer documents and a lot more "requests through the internet."

Original comment by tom.ritc...@gmail.com on 11 Apr 2013 at 8:07

GoogleCodeExporter commented 9 years ago
But what I came here really to say was that a workaround to this is very simple 
in user code.

You just send and receive one chunk of Yaml at a time and synthesize your own 
--- markers.  You need to search for exactly  \n--- on the incoming stream 
(some Yaml parsers don't write the closing \n for that --- line), chunk the 
input, and send those chunks to your parsing code, and similarly for the 
reverse direction.

Original comment by tom.ritc...@gmail.com on 11 Apr 2013 at 8:11

GoogleCodeExporter commented 9 years ago
Issue 206 has been merged into this issue.

Original comment by jbe...@gmail.com on 6 Jun 2013 at 3:52

GoogleCodeExporter commented 9 years ago
I attach a small patch against stream.cpp and singledocparser.cpp, which makes 
possible to call YAML::Load(std::istream&) multiple times on the same input, so 
that each document is returned as soon as it is complete.

The modifications are:
- make Stream::GetNextByte read only the available number of character from the 
source istream(buf);
- make SingleDocParser::HandleDocument eat stray DOC_END tokens at the 
beginning of its execution.

I did not test this patch extensively, but send it anyway just in case someone 
finds it useful.

Original comment by pin...@gmail.com on 14 Aug 2014 at 3:14

Attachments:

GoogleCodeExporter commented 9 years ago
... here is a better patch against version 0.5.1.
The Loader class allows to read multiple documents from the same input stream 
(tested using std::cin and a std::ifstream)

Original comment by pin...@gmail.com on 18 Aug 2014 at 2:25

Attachments: