atais / Fixed-Length

Scala library for parsing fixed length file format
MIT License
13 stars 12 forks source link

Track Max Length of Field Statically #12

Closed stephen-lazaro closed 6 years ago

stephen-lazaro commented 6 years ago

I have a use case where I have a lot of right padding on any given record. Basically, it'd be nice to be able to truncate to what I care about up front, but it is a bummer to duplicate that state outside the decoders. Part of me suspects that removing the errors that are thrown when the record is longer than expected would be the more satisfactory position, but this is certainly less invasive. Basically, the idea is to track the max length of a record through the Shapeless induction and then allow access to that so that users can track ideal usage. I welcome any suggestions on better design!

codecov[bot] commented 6 years ago

Codecov Report

Merging #12 into master will not change coverage. The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master    #12   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           7      7           
  Lines          99    109   +10     
  Branches        1      1           
=====================================
+ Hits           99    109   +10
Impacted Files Coverage Δ
...n/scala/com/github/atais/fixedlength/Decoder.scala 100% <100%> (ø) :arrow_up:
...ain/scala/com/github/atais/fixedlength/Codec.scala 100% <100%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 89dc508...20b0bc0. Read the comment docs.

atais commented 6 years ago

I am trying to understand what this solution is actually helping you with.

Basically, the idea is to track the max length of a record through the Shapeless induction and then allow access to that so that users can track ideal usage.

Since your maxLength is actually always equal to the position at the end of the last element, what do you need those all code for?

I would say the code below does the same (as your test and functionality):

val maxLength = 18

val employeeCodecLength: Long = {
   fixed[String](0, 10) <<:
   fixed[Option[Int]](10, 13, Alignment.Right) <<:
   fixed[Boolean](13, maxLength)
}

And damn, this is so much easier.

stephen-lazaro commented 6 years ago

True. You can do that if you have access to that state. But it could also be that you don't, at least not up front.

Additionally, consider that along this strategy, you'd have to import what is purely a parsing concern (the max length of a record) into a totally different package whose concern is not parsing (namely to truncate the input strings before attempting to parse the records). Kind of a bummer to break that hygienic boundary.

Also, it's not necessarily obvious that having to have the same fields in the same order is broadly speaking necessary, so there's no reason to assume that the last decoder we <<: in would be the last one to occur in the row (though that does seem to be how things work currently).

My last other reason for wanting something specifically as a feature of this library, is a general concern about maintainability. While the suggestion you pose works, the issue becomes a bit sharper when you have 177 columns and you need to add a new one that will push into the prior right padding. You then manually add a new decoder, and change the maxLength, and then have to fill in the prior max length in the second to last element. This is seriously exacerbated if you allow commuting decoders as well, since the the max length could occur anywhere in the chain! Why shouldn't we eliminate that element of human error where we can?

Anyway! To be fair, I recognize it might seem like over-engineering ;) Mostly, I'm of the opinion that if the parser can throw an exception when a record exceeds the max length, it should be able to report what that max length is to me so that I don't have to externally track it.

hderms commented 5 years ago

@stephen-lazaro I agree that this is kind of a limitation in Fixed-Length.

hderms commented 5 years ago

@stephen-lazaro https://github.com/atais/Fixed-Length/issues/13 seems like it would be much easier with a statically tracked max length. I'm unsure how to parse a collection of fixed length records that are all in the same file without knowing the size of the record

hderms commented 5 years ago

@atais to @stephen-lazaro has a really good point. Fixed-Length is nice and composeable but if you can combine a bunch of decoders together but are forced to independently track information that they use internally in order to write sensible code, that seems like a smell. The decoders know what their length are as evidenced by the ability to throw a max length exception. Given a string of length X it seems reasonable to be able to answer questions like "how many Foos can possibly be decoded from this string" or "based on X max packet size, how many Foos can be sent in a single packet?"

Opposition to this seems ill-founded based on these kind of questions being very central to the reasoning behind using a fixed width format in the first place.

atais commented 2 years ago

@hderms uh I did not see that message.

Wish you could provide some more complex example as a test that I could see and justify the change back then.