Juniper / libxo

The libxo library allows an application to generate text, XML, JSON, and HTML output using a common set of function calls. The application decides at run time which output style should be produced.
http://juniper.github.io/libxo/libxo-manual.html
BSD 2-Clause "Simplified" License
319 stars 47 forks source link

libxo inconsistent field truncation #86

Open Jamie-Landeg-Jones opened 2 years ago

Jamie-Landeg-Jones commented 2 years ago

Hi again Phil!

Tested on FreeBSD 12-stable and 13.0-RELEASE:

I noticed that "ps" sets the maximum length of the command field to 2048 characters [POSIX2_MAX_LENGTH] via the instructlion "{:command/%-0..2048s}", even when"-ww" is specified. "procstat -e" doesn't. Sigh.

Anyway, that's another issue, but in the process of checking this, I discovered that whilst libxo honours the maxlength for text, and html, it doesn't for XML, JSON, or CSV

As an example, try running this, and replacing the libxo output format as appropriate:

env -i $(jot 500 | awk '{print "testvar_"$0"=dummydummydummydummy"}') sh -c 'ps -wwe -p $$ --libxo xml'

Is this intentional? I couldn't find anything in the docs mentioning this.

Is this related to a kludgy fix for https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246514 ?

In my opinion, when or if or how fields should be truncated is a thorny issue, but whilst I find many implementations are messy, I still think it should be solely the responsibility of the calling program.

I.E. I don't think libxo should ignore the max width for some formats and not others...

Thoughts?

Cheers, Jamie

philshafer commented 2 years ago

On Apr 7, 2022, at 10:09 AM, Jamie Landeg Jones @.***> wrote:

Tested on FreeBSD 12-stable and 13.0-RELEASE: I noticed that "ps" sets the maximum length of the command field to 2048 characters [POSIX2_MAX_LENGTH] via the instructlion "{:command/%-0..2048s}", even when"-ww" is specified. "procstat -e" doesn't. Sigh.

Should procstat limit output as well?

Anyway, that's another issue, but in the process of checking this, I discovered that whilst libxo honours the maxlength for text, and html, it doesn't for XML, JSON, or CSV

Hmm…. my view was that the limit was related to display, like columns in a table, where “real” data formats would desire full content.

But this idea is truly broken, since there’s absolutely no guarantee what data lives past the limited data, or it there’s even a NUL at all. So this is definitely a bug and needs fixed asap. I’ll add a new field flag to give the current behavior (print the whole thing in data formats) when/where that’s desired. But doing it be default was just wrong. Sorry….

Yes, you could do this via the data format string, but it really shouldn’t default to “%s” as it currently does.

Is this related to a kludgy fix for https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246514 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246514 ?

No, this bug predates that ;^)

Thanks, Phil

Jamie-Landeg-Jones commented 2 years ago

Thanks for the quick response!

I'm not sure if procstat actually has a limit.. It doesn't provide a maxsize to --libxo, but there may be one internally, I don't know. I just know that if there is a defined limit, it's bigger than my test data!

But, my personal opinion in all cases like this, the option should exist to have the limit, or have it unlimited. I don't even care which is default - probably having a limit at default would be a good idea to stop people kicking themselves in the face), but I think it should be consistent across all output data types.

As for "ps", the reason this got me is due to the '-ww' option. The manpage says:

-w Use at least 132 columns to display information, instead of the default which is the window size if ps is associated with a terminal. If the -w option is specified more than once, ps will use as many columns as necessary without regard for the window size. Note that this option has no effect if the “command” column is not the last column displayed.

Using -w twice, implies to me that the output won't be truncated. However, I realise this is a "ps" issue, not libxo. I just mentioned this because this is what led me to notice the way truncation is applied by libxo to different formats.

But note, as this truncation is done within ps too, even the "real" data formats will be truncated unless '-ww' is used.

This means that as things stand, if you don't use '-ww', then the data is truncated with all libxo output formats. (presumably this is a bug related to the how the code was in the pre --libxo days). However, if you specifically say "I don't want truncation", and use '-ww" then libxo truncates for some format but not others. I understand your reasoning behind ignoring truncation for "real" data formats, but in this specific situation it is confusing.

Expanding on that, if a calling program specifies a maxwidth, the user may be setting it because they don't want to deal with a potentially huge json or xml file.

I dunno. It just makes sense to me to do what the calling program asks in all cases. This is a can of worms..... I like your idea of a new field flag for this. I was going to suggest something similar, but thought it would be overkill!, and I guess this is the sort of thing the person in the freebsd bug was trying to achieve with the "am i calling libxo or not" hack which as you pointed out, wasn't a reliable solution.

If you do add this, I really don't care which option is default, as long as the option exists! In fact, But still, doing it different for one format compared to another seems wrong. I get it, but ... What happens if someone adds a different data format as a plugin? How would libxo determine if it's a "presentation" format or a "real" format? How about people who assume html should be a "real" format, in that html doesn't have issues with "line" length, and can naturally format long lines on the screen?

Anyway, my main issue is due to "ps" not "libxo", so I'll take that up there.

Sorry for being incoherent here - this is one of those issues where the more you think about it, the harder it gets!

Cheers, Jamie

Jamie-Landeg-Jones commented 2 years ago

I also note that "ps" formats it's data using vis(3), which is probably something best left to libxo... ? Another rabbit-hole!