DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
686 stars 267 forks source link

Nucleotide masker for kraken2 #675

Closed ch4rr0 closed 1 year ago

DerrickWood commented 1 year ago

When compiling with g++ 11.3.0, I get the following warnings; can these be reviewed (or preferably fixed) prior to merge?

g++ -fopenmp -Wall -std=c++11 -O3 -DLINEAR_PROBING   -c -o dustmasker.o dustmasker.cc
dustmasker.cc: In member function ‘void SDust::reset()’:
dustmasker.cc:73:24: warning: operation on ‘((SDust*)this)->SDust::rv’ may be undefined [-Wsequence-point]
   73 |                 l = rv = rv = 0;
      |                     ~~~^~~~~~~~
dustmasker.cc: In function ‘void shiftWindow(SDust&, int)’:
dustmasker.cc:99:29: warning: comparison of integer expressions of different signedness: ‘std::deque<int>::size_type’ {aka ‘long unsigned int’} and ‘int’ [-Wsign-compare]
   99 |         if (sd.kmers.size() >= windowSize - 2) {
      |             ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
dustmasker.cc:103:26: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::deque<int>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
  103 |                 if (sd.l > sd.kmers.size()) {
      |                     ~~~~~^~~~~~~~~~~~~~~~~
dustmasker.cc: In function ‘void findPerfect(SDust&, int)’:
dustmasker.cc:156:39: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::vector<PerfectInterval>::size_type’ {aka ‘long unsigned int’} [-Wsign-compare]
  156 |                         for (j = 0; j < sd.perfectIntervals.size() && sd.perfectIntervals[j].start >= i + windowStart; ++j) {
      |                                     ~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
dustmasker.cc: In function ‘void runSymmetricDust(SDust&, char*, size_t, int)’:
dustmasker.cc:181:34: warning: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
  181 |         for (int i = 0, l = 0; i < size; i++) {
      |                                ~~^~~~~~
dustmasker.cc: In function ‘SDust* mask(SDust*)’:
dustmasker.cc:235:72: warning: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘int’ [-Wsign-compare]
  235 |                                 for (size_t i = sd->ranges[j].start; i < sd->ranges[j].finish; i++)
      |                                                                      ~~^~~~~~~~~~~~~~~~~~~~~~
dustmasker.cc: In function ‘void runSymmetricDust(SDust&, char*, size_t, int)’:
dustmasker.cc:180:13: warning: ‘windowStart’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  180 |         int windowStart;
      |             ^~~~~~~~~~~
DerrickWood commented 1 year ago

Other notes; if this is intended to be a replacement for the NCBI dustmasker, we'll need to make the following changes prior to merge:

ch4rr0 commented 1 year ago

This latest commit seeks to address the feedback given. I also found, using a sanitizer, and addressed potential undefined behavior in seqreader.cc.

BenLangmead commented 1 year ago

@DerrickWood Are you OK with this with @ch4rr0 's latest changes?