SamuraiAku / SPDX.jl

Provides for the creation, reading and writing of SPDX files in multiple file formats. Written in pure Julia.
MIT License
5 stars 1 forks source link

Add IO interface to `readspdx`/`writespdx` #44

Closed omus closed 4 months ago

omus commented 5 months ago

Add IO methods for readspdx and writespdx to support working with in memory data.

SamuraiAku commented 5 months ago

Back when I started on this package, I have a vague memory of JSON.parse() not working well with IOBuffer. I think maybe it had something to do with carriage returns in comment fields. If I remember right, that's why all the tests use strings for JSON data. It could be all those issues have been fixed by now.

I'll want to experiment a bit with this and make sure there are no gotchas.

SamuraiAku commented 5 months ago

@omus I don't think this is going to work because of problems with JSON. Here's an example

julia> using JSON

julia> JSON_string_noCR= """ {"annotations": [
                   {
                       "annotationDate" : "2024-05-24T17:14:55Z",
                       "annotationType" : "REVIEW",
                       "annotator" : "Person: Jane Doe (nowhere@loopback.com)",
                       "comment" : "This is a comment"
                   }
               ]}"""

julia> JSON_string= """ {"annotations": [
                   {
                       "annotationDate" : "2024-05-24T17:14:55Z",
                       "annotationType" : "REVIEW",
                       "annotator" : "Person: Jane Doe (nowhere@loopback.com)",
                       "comment" : "This is a comment\nContinued"
                   }
               ]}"""

julia> IOBuf_noCR= IOBuffer("""{"annotations": [
                   {
                       "annotationDate" : "2024-05-24T17:14:55Z",
                       "annotationType" : "REVIEW",
                       "annotator" : "Person: Jane Doe (nowhere@loopback.com)",
                       "comment" : "This is a comment"
                   }
               ]}""")

julia> IOBuf= IOBuffer("""{"annotations": [
                          {
                              "annotationDate" : "2024-05-24T17:14:55Z",
                              "annotationType" : "REVIEW",
                              "annotator" : "Person: Jane Doe (nowhere@loopback.com)",
                              "comment" : "This is a comment\nContinued"
                          }
                      ]}""")

# These parse just fine
julia> JSON.parse(JSON_string_noCR)
julia> JSON.parse(IOBuf_noCR)

# But these error
julia> JSON.parse(JSON_string)
ERROR: ASCII control character in string
Line: 5
Around: ...a comment Continued"     } ]}...
                    ^

Stacktrace:

julia> JSON.parse(IOBuf_noCR)
ERROR: Unexpected end of input
 ...when parsing byte with value '0'
Stacktrace:

Using JSON.parsefile does not have this problem. Do you have any ideas why this happens?

omus commented 5 months ago

The issue with JSON.jl and control characters occurs when parsing from a string, IO, or using parsefile:

julia> using JSON

julia> JSON_string= """ {"annotations": [
                   {
                       "annotationDate" : "2024-05-24T17:14:55Z",
                       "annotationType" : "REVIEW",
                       "annotator" : "Person: Jane Doe (nowhere@loopback.com)",
                       "comment" : "This is a comment\nContinued"
                   }
               ]}""";

julia> IOBuf= IOBuffer(JSON_string);

julia> JSON.parse(JSON_string);
ERROR: ASCII control character in string
Line: 5
Around: ...a comment Continued"     } ]}...
                    ^
...

julia> JSON.parse(seekstart(IOBuf));
ERROR: ASCII control character in string
 ...when parsing byte with value '67'
...

julia> file = tempname() * ".json";

julia> write(file, JSON_string)
236

julia> JSON.parsefile(file);
ERROR: ASCII control character in string
Line: 5
Around: ...a comment Continued"     } ]}...
                    ^
...

I'll mention that the error you showed in your comment (Unexpected end of input) was due to having already read to the end of the IOBuffer which is why I added seekstart into my example above:

julia> JSON.parse(IOBuf_noCR);
ERROR: Unexpected end of input

The JSON spec doesn't seem to support actual control characters such as newline. So a JSON string \n in Julia would be displayed on the REPL as \\n. So for JSON.jl you can use JSON.json to perform the escaping for you:

julia> JSON.json("\n")
"\"\\n\""

Are there any other issues you can see with this change?

SamuraiAku commented 5 months ago

I'm understanding better what's going on here. In my package PkgToSoftwareBOM.jl I do generate strings with a carriage return in them. And when I write them to file with JSON.print(), it auto-converts those from a control character to the character literals "\" and "n". Then when reading the file, JSON.parse sees the characters together and combines them into a carriage return.

SamuraiAku commented 4 months ago

Ran a read/write test with a complicated SPDX file in PkgToSoftwareBOM/examples. The functions work. I'll merge soon.

SamuraiAku commented 4 months ago

Released as v0.4.1