arq5x / bedtools2

bedtools - the swiss army knife for genome arithmetic
MIT License
922 stars 287 forks source link

Latest conda version throws error for large chromStart values #1099

Open SohaibRais opened 1 month ago

SohaibRais commented 1 month ago

I installed the latest conda version for bedtools and ran an intersection and found bedtools stopped once the chromosome start position for a peak was equal to or greater than 1.66e+08. The exact error I received was "ERROR: illegal number "1.66e+08". Exiting..." Scaling down the version to 2.27.1 fixed this issue.

sof202 commented 1 month ago

I have had the exact same issue but with a chromosome start position exceeding 1.1e+07 Specifically bedtools v2.31.1

sof202 commented 3 weeks ago

From /src/utils/general/ParseTools.cpp:

CHRPOS str2chrPos(const char * __restrict str, size_t ulen) {

    if (ulen == 0) {
        ulen = strlen(str);
    }

    const char* endpos = str;
    long long result = 0;
    bool neg = false;
    char last = 0;

    if(*endpos == '-') neg = true, endpos ++;

    for(;(last = *endpos); endpos ++) {
        if(last < '0' || last > '9') break;
        result = result * 10 + last - '0';
    }

    if(last) {
        if(*endpos == 'e' || *endpos == 'E') {
            char* endpos = NULL;
            CHRPOS ret = (CHRPOS)strtod(str, &endpos); 

            if(endpos && *endpos == 0) {
                return ret;
            }
        }
        fprintf(stderr, "***** ERROR: illegal number \"%s\". Exiting...\n", str);
        exit(1);
    }

    return neg?-result:result;
}

Correct me if I'm wrong but this doesn't take into account decimal points (1 . 66e+08)? Upon encountering the "." character, the code breaks out of the for loop (as a decimal point is neither >= "0" nor <= "9" in any encoding), but the next if statement is only looking for e or E to handle the exponent. This logic does not handle the decimal point in numbers in scientific notation.

I'm not very well versed in C++ or this code base, but this logic is handled differently in v2.27.1 (I don't get the 'illegal number' errors in v2.27.1).