Open mikedolanfliss opened 9 years ago
Thanks for reporting.
Could you add a small (constructed) example? It is not completely clear what your file looks like.
I'll have to see how, if and when I'm able to add this to LaF. For fast random access LaF uses the fixedness of the fixed width files. To reading line 10.000 can be done by skipping to byte (10.000 - 1)*(sum(widths of all columns) + width of new line). This would no longer work for your file.
So, this would result in a new reader (besides laf_open_csv
and laf_open_fwf
). However, this would also be usable for UTF-8 'fixed width' files.
Thanks for following up! It's definitely more an issue of the data file than LaF, and for fixed-width LaF obviously requires that structure for access.
Creating a new reader might be a possibility - or a function to reformat into a true fixed width by standardizing the whitespace/length in the last field. That is, dealing with these sorts of "fixed-width" records - sometimes happens with SQL server, and others that export poorly formatted fwfs. With underscore instead of space:
Pseudo-fwf: 1234 12345 123456 12345_ 123__4
Needs to be formatted as 123_4_ 12345_ 123456 12345_ 123_4
Then laf_open_fwf could handle it (which I'd prefer to use). If laf_open_fwf could throw a warning when something seems to be a poorly formatted fwf (maybe a test_fwf=T parameter), and there were a function to attempt an fwf formatting of a dataset...?
mike
Big fan of the package.
SQL Server seems to dump fixed width files with a premature \r\n end line on the last column on an otherwise fixed-width file. Anyway to handle that with LaF in the future? As is, LaF reads over the end-of-line into the next record, and all is buggered. :)