[BUGZILLA #16275] read.fortran doesn't recognise D format

MichaelChirico commented 4 years ago

?read.fortran states that:

The format for a field is of one of the following forms: rFl.d, rDl.d, rXl, rAl, rIl...

However, when trying to use the "D" format the following error and warnings are given:

read.fortran(textConnection("1.23e4"),format="D6") Error in processFormat(format) : missing lengths for some fields In addition: Warning messages: 1: In processFormat(format) : NAs introduced by coercion 2: In processFormat(format) : NAs introduced by coercion 3: In processFormat(format) : NAs introduced by coercion

The template definition in line 5 of read.format seems to be where the problem lies,

    template <- "^([0-9]*)([FXAI])([0-9]*)\\.?([0-9]*)"

This can be simply fixed by changing it to,

    template <- "^([0-9]*)([FDXAI])([0-9]*)\\.?([0-9]*)"

as the rest of the code is written to deal with "D". Indeed it can be confirmed:

read.fortran2(textConnection("1.23e4"),format="D6") V1 1 12300

However, this still doesn't allow Fortran double precision formatted numbers to be read:

read.fortran2(textConnection("1.23D4"),format="D6") Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got '1.23D4'

METADATA

Bug author - James
Creation time - 2015-03-20 14:38:17 UTC
Bugzilla link
Status - NEW
Alias - None
Component - I/O
Version - R 3.0.2
Hardware - x86_64/x64/amd64 (64-bit) Windows 64-bit
Importance - P5 minor
Assignee - R-core
URL -
Modification time - 2019-10-10 16:59 UTC

MichaelChirico commented 4 years ago

Created attachment 2466 [details] partial fix to bug report

This fixes the regex match early in the read.fortran() code, but the deeper issue remains.

METADATA

Comment author - Michael Chirico
Timestamp - 2019-10-10 16:50:43 UTC

INCLUDED PATCH

ID - 6
Author - Michael Chirico
Link to download patch - https://bugs.r-project.org/bugzilla/attachment.cgi?id=2466
Timestamp - 2019-10-10 16:50 UTC
Extra info - (955 bytes, patch)

MichaelChirico commented 4 years ago

Have submitted a patch for the simpler part of this issue -- namely, that the regex for processFormat() is wrong.

The latter issue is much deeper -- read.fortran passes off the work of reading to read.fwf() -> read.table() -> scan, which fails as mentioned:

read.fwf(textConnection("1.23D4"), widths = 5L, colClasses = 'numeric')

So solving this would require a much deeper change.

(1) could edit the chain in C to flag that D/d not E/e are the exponent markers for this input. Chain I see is scan.c:do_scan->scanVector->extractItem->Strtod-->util.c:R_strtod4->R_strtod5.

(2) could remove the outsourcing to read.fwf and parse the fortran formats more directly

METADATA

Comment author - Michael Chirico
Timestamp - 2019-10-10 16:59:13 UTC

MichaelChirico / r-bugs