WeTransfer / format_parser

file metadata parsing, done cheap
https://rubygems.org/gems/format_parser
Other
62 stars 18 forks source link

Return a more uniform structure in as_json #151

Open linkyndy opened 4 years ago

linkyndy commented 4 years ago

At the moment, results are returned as a regular hash, with keys and values containing a mix of strings and symbols:

{"nature"=>:archive, "intrinsics"=>nil, "entries"=>[{:type=>:file, :size=>53335, :filename=>"foo"}, {:type=>:file, :size=>292210, :filename=>"bar"}, {:type=>:file, :size=>1965044, :filename=>"baz"}], "format"=>:zip}

It happened to me several times that I was getting the key or comparing the value using the wrong type. I think normalising the return value of as_json would be a nice improvement.

Currently, I am doing this to normalise the result hash:

JSON.load(JSON.dump(result))

Here are the options I see:

fabioperrella commented 4 years ago

After trying yo use the new version with stringify_keys:true in our service, we realized it wasn't enough 😞 .

When the hash has non-strings as values, it raises errors like the following one, when trying to serialize it (when saving to Dynamodb):

Errors > JSON::GeneratorError#15
source sequence is illegal/malformed utf-8

So, the result of as_json should be ready to be serialized to JSON, with keys and values converted to String. This is how activemodel#as_json works.

A possible solution is having an option stringify_keys_and_values: true, which is not a good name (too long) or something similar, to be backwards compatible.

wdyt @linkyndy ?

linkyndy commented 4 years ago

I don't believe that error is raised because there are symbols used in the hash. But I didn't dig into it, so you may be right.

One easy fix is to make the method return strings only, and bump the version to a major one 🙂 . Or, like you said, you can use an option to keep it backwards compatible (use_strings?).

fabioperrella commented 4 years ago

I always prefer to keep compatible when possible! 😄

use_strings seems good for me!

So, I will open another PR here and close https://github.com/WeTransfer/peek/pull/127

fabioperrella commented 4 years ago

I realized the problem with JSON.dump(result) (source sequence is illegal/malformed utf-8) happens when the result hash contains non utf-8 strings inside.

For example, when the filename is something like "foo\xA9s"