COMBINE-lab / RapMap

Rapid sensitive and accurate read mapping via quasi-mapping
GNU General Public License v3.0
89 stars 23 forks source link

textLen_ is uninitialised in the function lce() #45

Closed jenni-westoby closed 5 years ago

jenni-westoby commented 5 years ago

textLen_ is uninitialised in the function lce()

In RapMap v.0.5.0, textLen_ is uninitialised in the function lce() in the file include/SASearcher.hpp. The function in question is in the code snippet below.

        OffsetT lce(OffsetT p1, OffsetT p2,
                    OffsetT startAt=0,
                    OffsetT stopAt=std::numeric_limits<OffsetT>::max(),
                    bool verbose=false) {
            std::string& seq = *seq_;
            std::vector<OffsetT>& SA = *sa_;
            OffsetT len = static_cast<OffsetT>(startAt);
            auto o1 = SA[p1] + startAt;
            auto o2 = SA[p2] + startAt;
            auto maxIndex = std::max(o1, o2);
            while (maxIndex + len < textLen_ and seq[o1+len] == seq[o2+len]) {
                if (seq[o1+len] == '$') { break; }
                if (len >= stopAt) { break; }
                ++len;
            }
            return len;
        }

The value of textLen is never set. In practice, if I modify the function to print the value of textLen as below, I find that textLen is always zero, meaning the central while loop is never executed. This is also the case on the develop-salmon branch (although under default command line options, lce is never actually executed on this branch). I am guessing that the aim of the first condition of the while loop (maxIndex + len < textLen) is to stop the while loop from executing if we have reached the end of the transcriptome. If I am correct, this bug could be fixed by setting the value of textLen_ to the length of the transcriptome.


        OffsetT lce(OffsetT p1, OffsetT p2,
                    OffsetT startAt=0,
                    OffsetT stopAt=std::numeric_limits<OffsetT>::max(),
                    bool verbose=false) {

        std::cerr << "In lce" << '\n';
        std::cerr << "textLen_ = " << textLen_ << '\n';
            std::string& seq = *seq_;
            std::vector<OffsetT>& SA = *sa_;
            OffsetT len = static_cast<OffsetT>(startAt);
            auto o1 = SA[p1] + startAt;
            auto o2 = SA[p2] + startAt;
            auto maxIndex = std::max(o1, o2);
            while (maxIndex + len < textLen_ and seq[o1+len] == seq[o2+len]) {
                if (seq[o1+len] == '$') { break; }
                if (len >= stopAt) { break; }
                ++len;
            }
            return len;
        }
rob-p commented 5 years ago

Hi @jenni-westoby,

This is a great catch! In fact, the zero initialization is a (probably) compiler and system specific stroke of good luck, as it's completely within the scope of the language spec to instead let that variable be randomly initialized. The result of that could be even worse if it allowed an lce to fall off the end of the suffix array. I've pushed fixes for this to both develop-salmon and master. Thanks again, not only for catching this but for the detailed bug report.