MichaelChirico / r-bugs

A ⚠️read-only⚠️mirror of https://bugs.r-project.org/
20 stars 0 forks source link

[BUGZILLA #16275] read.fortran doesn't recognise D format #5701

Open MichaelChirico opened 4 years ago

MichaelChirico commented 4 years ago

?read.fortran states that:

The format for a field is of one of the following forms: rFl.d, rDl.d, rXl, rAl, rIl...

However, when trying to use the "D" format the following error and warnings are given:

read.fortran(textConnection("1.23e4"),format="D6") Error in processFormat(format) : missing lengths for some fields In addition: Warning messages: 1: In processFormat(format) : NAs introduced by coercion 2: In processFormat(format) : NAs introduced by coercion 3: In processFormat(format) : NAs introduced by coercion

The template definition in line 5 of read.format seems to be where the problem lies,

    template <- "^([0-9]*)([FXAI])([0-9]*)\\.?([0-9]*)"

This can be simply fixed by changing it to,

    template <- "^([0-9]*)([FDXAI])([0-9]*)\\.?([0-9]*)"

as the rest of the code is written to deal with "D". Indeed it can be confirmed:

read.fortran2(textConnection("1.23e4"),format="D6") V1 1 12300

However, this still doesn't allow Fortran double precision formatted numbers to be read:

read.fortran2(textConnection("1.23D4"),format="D6") Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got '1.23D4'


METADATA

MichaelChirico commented 4 years ago

Created attachment 2466 [details] partial fix to bug report

This fixes the regex match early in the read.fortran() code, but the deeper issue remains.


METADATA

INCLUDED PATCH

MichaelChirico commented 4 years ago

Have submitted a patch for the simpler part of this issue -- namely, that the regex for processFormat() is wrong.

The latter issue is much deeper -- read.fortran passes off the work of reading to read.fwf() -> read.table() -> scan, which fails as mentioned:

read.fwf(textConnection("1.23D4"), widths = 5L, colClasses = 'numeric')

So solving this would require a much deeper change.

(1) could edit the chain in C to flag that D/d not E/e are the exponent markers for this input. Chain I see is scan.c:do_scan->scanVector->extractItem->Strtod-->util.c:R_strtod4->R_strtod5.

(2) could remove the outsourcing to read.fwf and parse the fortran formats more directly


METADATA