Closed stephen-lazaro closed 6 years ago
Merging #12 into master will not change coverage. The diff coverage is
100%
.
@@ Coverage Diff @@
## master #12 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 7 7
Lines 99 109 +10
Branches 1 1
=====================================
+ Hits 99 109 +10
Impacted Files | Coverage Δ | |
---|---|---|
...n/scala/com/github/atais/fixedlength/Decoder.scala | 100% <100%> (ø) |
:arrow_up: |
...ain/scala/com/github/atais/fixedlength/Codec.scala | 100% <100%> (ø) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 89dc508...20b0bc0. Read the comment docs.
I am trying to understand what this solution is actually helping you with.
Basically, the idea is to track the max length of a record through the Shapeless induction and then allow access to that so that users can track ideal usage.
Since your maxLength
is actually always equal to the position at the end of the last element, what do you need those all code for?
I would say the code below does the same (as your test and functionality):
val maxLength = 18
val employeeCodecLength: Long = {
fixed[String](0, 10) <<:
fixed[Option[Int]](10, 13, Alignment.Right) <<:
fixed[Boolean](13, maxLength)
}
And damn, this is so much easier.
True. You can do that if you have access to that state. But it could also be that you don't, at least not up front.
Additionally, consider that along this strategy, you'd have to import what is purely a parsing concern (the max length of a record) into a totally different package whose concern is not parsing (namely to truncate the input strings before attempting to parse the records). Kind of a bummer to break that hygienic boundary.
Also, it's not necessarily obvious that having to have the same fields in the same order is broadly speaking necessary, so there's no reason to assume that the last decoder we <<:
in would be the last one to occur in the row (though that does seem to be how things work currently).
My last other reason for wanting something specifically as a feature of this library, is a general concern about maintainability. While the suggestion you pose works, the issue becomes a bit sharper when you have 177 columns and you need to add a new one that will push into the prior right padding. You then manually add a new decoder, and change the maxLength, and then have to fill in the prior max length in the second to last element. This is seriously exacerbated if you allow commuting decoders as well, since the the max length could occur anywhere in the chain! Why shouldn't we eliminate that element of human error where we can?
Anyway! To be fair, I recognize it might seem like over-engineering ;) Mostly, I'm of the opinion that if the parser can throw an exception when a record exceeds the max length, it should be able to report what that max length is to me so that I don't have to externally track it.
@stephen-lazaro I agree that this is kind of a limitation in Fixed-Length.
@stephen-lazaro https://github.com/atais/Fixed-Length/issues/13 seems like it would be much easier with a statically tracked max length. I'm unsure how to parse a collection of fixed length records that are all in the same file without knowing the size of the record
@atais to @stephen-lazaro has a really good point. Fixed-Length is nice and composeable but if you can combine a bunch of decoders together but are forced to independently track information that they use internally in order to write sensible code, that seems like a smell. The decoders know what their length are as evidenced by the ability to throw a max length exception. Given a string of length X it seems reasonable to be able to answer questions like "how many Foos can possibly be decoded from this string" or "based on X max packet size, how many Foos can be sent in a single packet?"
Opposition to this seems ill-founded based on these kind of questions being very central to the reasoning behind using a fixed width format in the first place.
@hderms uh I did not see that message.
Wish you could provide some more complex example as a test that I could see and justify the change back then.
I have a use case where I have a lot of right padding on any given record. Basically, it'd be nice to be able to truncate to what I care about up front, but it is a bummer to duplicate that state outside the decoders. Part of me suspects that removing the errors that are thrown when the record is longer than expected would be the more satisfactory position, but this is certainly less invasive. Basically, the idea is to track the max length of a record through the Shapeless induction and then allow access to that so that users can track ideal usage. I welcome any suggestions on better design!