Closed 9inpachi closed 1 year ago
So, adding to what I said earlier. The size difference is actually remarkable! The more data we have, the more space we save.
Take a look at the same JSON data converted to the new format:
Hi @9inpachi, thanks for this! Truly impressive size reductions!
So, after some reflection I think:
1) We should put types
into ObjectType
: I know for ATLAS that e.g. Tracks can have different content depending on the collection. It would be okay to do it the other way around (we would just make types
the superset) but since I don't think this will have a significant impact on the size, I think the extra flexibility and clarity is worth it. We could actually do both - if types
are defined in the collection then they aren't needed in the ObjectType
, but otherwise they are?
2) Probably we should just completely change the framework. The only pain from my side is I will need to rewrite the dumping functions in Athena. I guess we could have two versions of the format - and call this compact phoenix format JSON? But we shouldn't overcomplicate the code.
Of course, compressing the files would make them REALLY tiny!
if types are defined in the collection then they aren't needed in the ObjectType, but otherwise they are?
This is surely possible but it will be upon the one generating the files to specify all this information - which I think might be troublesome. Using types
inside each collection separately is more flexible I think - and we won't be adding more than 2-3 KB of data if we use it.
The only pain from my side is I will need to rewrite the dumping functions in Athena.
You don't actually need to. Just use the same functions you currently have. There is a function in Phoenix to convert the older JSON format to newer one and download the file.
I guess we could have two versions of the format - and call this compact phoenix format JSON? But we shouldn't overcomplicate the code.
Yes, this is my approach so far - the reason why I didn't change the framework at all. And I think it's better if we support both the formats. Handling the current Phoenix format is easier in the code so we should just convert the compact Phoenix format to the current one and that should be it.
Hi @EdwardMoyse,
So I was finalizing this and I have some really bad news - rendering all the conversion functions useless. That is, the sizes I compared was for current format with spacing (formatted JSON) and the new format with minified JSON (no spacing) which naturally decreases the size. In actuality, the size difference when both the formats are minified is only 9 KB (229 KB for current format and 220 KB for the new format). :(
I guess we can close this one? (Re-open if you disagree!)
Hi all,
So about the event data JSON. Edward and I discussed a new format that's about 40% less in size.
Previous Data Format:
New Data Format (inspired by CMS ".ig"):
The advantage of the new data format is that we don't have to use object keys for identifying each parameter of physics object in a collection. Using keys as parameter identifiers takes a lot of space since we are duplicating keys for every physics object. So having an array named "types" for identifying the index of each parameter will help us in not having to use keys for parameters of physics objects and instead use the types array to identify at which index a parameter exists.
Now there are several questions or rather discrepancies which I would like to discuss.
types
array directly insideObjectType
or insideObjectTypeCollection
? Can object parameters be different for different collections?PhoenixObjects
and processing each collection?That's all. (I know it's a lot - but since we are about to change an integral part of Phoenix - it would be better if everything is clarified)
Peace. :)