cjlin1 / liblinear

LIBLINEAR -- A Library for Large Linear Classification
https://www.csie.ntu.edu.tw/~cjlin/liblinear/
BSD 3-Clause "New" or "Revised" License
1.01k stars 342 forks source link

A potential Integer Overflow bug found in liblinear/linear.cpp #89

Closed x14ngch3n closed 1 year ago

x14ngch3n commented 1 year ago

Hi, I'm currently trying to use the static analysis tool Infer to find uncatched API-misuse bugs in OpenWrt packages, and I find a potential Integer Overflow in your project.

The bug located in liblinear/linear.cpp. This bug resembles another one I mentioned in https://github.com/cjlin1/liblinear/issues/88. In this issue, if nr_class equals to 2 and the parameter of Malloc is more easily to cause an integer overflow, as shown in the following code:

nr_feature=model_->nr_feature;
if(model_->bias>=0)
    n=nr_feature+1;
else
    n=nr_feature;
int w_size = n;
int nr_w;
if(nr_class==2 && param.solver_type != MCSVM_CS)
    nr_w = 1;
else
    nr_w = nr_class;
model_->w=Malloc(double, w_size*nr_w);

I also attached the analysis trace given by Infer FYI:

"trace": [
  {
    "file": "liblinear/linear.cpp",
    "line": 2247,
    "col": 4,
    "feature": [ "Input", "fscanf" ]
  },
  {
    "file": "liblinear/linear.cpp",
    "line": 2286,
    "col": 5,
    "feature": [
      "Prune",
      [
        "UnOp",
        "!",
        [ "BinOp", "==", [ "Var" ], [ "Const", [ "Cint", 2 ] ] ]
      ]
    ]
  },
  {
    "file": "liblinear/linear.cpp",
    "line": 2289,
    "col": 3,
    "feature": [ "Store", [ "Var" ], [ "Var" ] ]
  },
  {
    "file": "liblinear/linear.cpp",
    "line": 2291,
    "col": 12,
    "feature": [
      "IntOverflow",
      "malloc",
      [
        "BinOp",
        "*",
        [
          "Cast",
          [ "Unsupported" ],
          [ "BinOp", "*", [ "Var" ], [ "Var" ] ]
        ],
        [ "Sizeof" ]
      ]
    ]
  }
],
KyleLin123456 commented 1 year ago

Hi, I read your problem and would like to ask why you said the parameter of Malloc is more easily to overflow when nr_class is equal to 2? Is there a simple narrative to explain? Also, in previous issue #88, did you actually encounter the scenario where you have model->nr_feature greater than 2^16-1 thus causes overflow? or is it just a potential scenario?

x14ngch3n commented 1 year ago

Well, it seems a mistake in my expression. I was just want to say that nr_class=2 is more easily to cause an integer overflow than nr_class=1. Of course, nr_class can be a much higher value (than 2) and assigned to nr_w to cause an integer overflow much more easily, but I am not familiar with the restriction of the source of nr_class, so I didn't make that further statement.

yes, it is just a potential scenario reported by static analyzer, I tried to figure out how to actually trigger this bug but failed.

KyleLin123456 commented 1 year ago

Well, if you look closely at the if-else statement, nr_w is assigned to value 1 when nr_class is either equal to 1 or 2, so I think both cases have the same chance to occur overflow. Also, nr_class and nr_w are of type int, I assume overflowing in typical machine isn't a main concern, but if you really have a large set of data, @cjlin1 did mention a version that can handle larger set of data in #88. I guess this solution might solve your problem.

Same solution for #88

cjlin1 commented 1 year ago

At this moment we do not cover data with more than 2^32 features/labels. If you need to handle such large data, please use "LIBLINEAR for more than 2^32 instances/features (experimental)" at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#liblinear_for_more_than_2^32_instances_features_experimental

So we don't plan to do any change here. Thank you again for your comments.