fossology / fossology

FOSSology is an open source license compliance software system and toolkit. As a toolkit you can run license, copyright and export control scans from the command line. As a system, a database and web ui are provided to give you a compliance workflow. License, copyright and export scanners are tools used in the workflow.
https://fossology.github.io/
GNU General Public License v2.0
793 stars 415 forks source link

nomos segv #479

Closed bobgob closed 9 years ago

bobgob commented 9 years ago

Tested on #968e92a nomos segv's on this text file

http://www.fossology.org/attachments/4611/README.win64.txt

steffen-weber commented 9 years ago

With 7b7af78, nomos will detect UnclassifiedLicense in README.win64.txt. Shall this file be added to NomosTestfiles/UnclassifiedLicense?

bobgob commented 9 years ago

Yes. Please. Adding to NomosTestfiles prevents regressions.

bobgob commented 9 years ago

What types of strings triggered the segv?

bobgob commented 9 years ago

The README.win64.txt works now, but here is another failure:

http://www.fossology.org/attachments/4617/utf-8-bom.html

steffen-weber commented 9 years ago

The variable cur.matchBase is unset for README.win64.txt in parse.c:8503. Same problem might occur in parse.c:6302.

bobgob commented 9 years ago

What I meant was, why was it unset (what was unique about the test case)? Given the new failing test case, do you want me to reclose this and open a new issue? I opened this back up because the new test case also causes a segv and might have implications on your previous checkin.

steffen-weber commented 9 years ago

Added http://www.fossology.org/attachments/4617/utf-8-bom.html to test files, but it works. Maybe uploading the file to fossology.org did fix an encoding issue and the origin file needs to be checked into the git repo for reproducing the error.

bobgob commented 9 years ago

I just tested with #34ae8ab. nomos works but nomossa fails:

bobg@bobg:~/fossology/src/nomos/agent(master)$ gdb -c core --args ./nomossa /home/bobg/testfiles/nomossa-fails/main/i/icedove/38.0~b2-1/mozilla/testing/web-platform/tests/tools/html5lib/html5lib/tests/utf-8-bom.html
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/bobg/fossology/src/nomos/agent/nomossa...done.
[New LWP 9904]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./nomossa /home/bobg/testfiles/nomossa-fails/main/i/icedove/38.0~b2-1/mozilla/t'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f57a6ea9be6 in _int_malloc (av=0x7f57a71b6e60, bytes=968) at malloc.c:4665
4665    malloc.c: No such file or directory.
(gdb) bt
#0  0x00007f57a6ea9be6 in _int_malloc (av=0x7f57a71b6e60, bytes=968) at malloc.c:4665
#1  0x00007f57a6eabc00 in *__GI___libc_malloc (bytes=968) at malloc.c:3660
#2  0x00007f57a6eebcb0 in create_token_tree (dfa=0x9f3d70, left=<optimized out>, right=<optimized out>, token=<optimized out>)
    at regcomp.c:3759
#3  0x00007f57a6eebdb0 in create_tree (dfa=0x7f57a71b6e60, left=0x3c8, right=0x7f57a71b6ec8, type=<optimized out>)
    at regcomp.c:3749
#4  0x00007f57a6ef7e4a in parse_sub_exp (err=<optimized out>, nest=<optimized out>, syntax=<optimized out>, 
    token=<optimized out>, preg=<optimized out>, regexp=<optimized out>) at regcomp.c:2448
#5  parse_expression (regexp=<optimized out>, preg=0x6c77a0, token=0x7fff8ad95b60, syntax=<optimized out>, nest=0, 
    err=<optimized out>) at regcomp.c:2237
#6  0x00007f57a6ef9811 in parse_branch (regexp=0x7f57a71b6e60, preg=0x3c8, token=0x7f57a71b6ec8, syntax=30, nest=10448976, 
    err=0x1) at regcomp.c:2163
#7  0x00007f57a6ef995e in parse_reg_exp (regexp=0x7f57a71b6e60, preg=0x3c8, token=0x7f57a71b6ec8, syntax=30, nest=10448976, 
    err=0x1) at regcomp.c:2122
#8  0x00007f57a6ef9cdb in parse (err=<optimized out>, syntax=<optimized out>, preg=<optimized out>, regexp=<optimized out>)
    at regcomp.c:2091
#9  re_compile_internal (preg=0x6c77a0, pattern=<optimized out>, length=<optimized out>, syntax=<optimized out>)
    at regcomp.c:799
#10 0x00007f57a6efa86c in __regcomp (preg=0x6c77a0, 
    pattern=0x9d8160 "(©|\\(c\\)|copyright|\\<c\\>[^+:]|&copy) (19|20)[0-9][0-9][ ,-]+.{0,60}easy software products", 
    cflags=<optimized out>) at regcomp.c:506
#11 0x000000000042b200 in idxGrep_base (index=1180, data=0x9f6820 " title Test /title p Hello World ©", flags=3, mode=3)
    at nomos_regex.c:297
#12 0x000000000042afe9 in idxGrep_recordIndex (index=1180, data=0x9f6820 " title Test /title p Hello World ©", flags=3)
    at nomos_regex.c:225
#13 0x0000000000427c88 in findPhrase (index=1180, 
    filetext=0x9f55c0 "<!doctype html>\r\n<title>Test</title>\r\n<p>Hello World! ©", size=59, isML=1, isPS=0, qType=0)
    at parse.c:7630
#14 0x0000000000406346 in fileHasPatt (licTextIdx=1180, 
    filetext=0x9f55c0 "<!doctype html>\r\n<title>Test</title>\r\n<p>Hello World! ©", size=59, isML=1, isPS=0, qType=0)
    at parse.c:287
#15 0x000000000040968a in parseLicenses (filetext=0x9f55c0 "<!doctype html>\r\n<title>Test</title>\r\n<p>Hello World! ©", 
    size=59, scp=0x9f12d0, isML=1, isPS=0) at parse.c:972
#16 0x0000000000404d95 in saveLicenseData (scores=0x9f12d0, nCand=1, nElem=1, lowWater=1) at licenses.c:1083
#17 0x00000000004045fb in licenseScan (licenseList=0x6b25a0) at licenses.c:781
#18 0x000000000042abfd in processRegularFiles () at process.c:155
#19 0x000000000042abcc in processNonPackagedFiles () at process.c:62
#20 0x000000000042abec in processRawSource () at process.c:135
#21 0x000000000042edac in processFile (
    fileToScan=0x7fff8ad977ac "/home/bobg/testfiles/nomossa-fails/main/i/icedove/38.0~b2-1/mozilla/testing/web-platform/tests/tools/html5lib/html5lib/tests/utf-8-bom.html") at nomos_utils.c:805
#22 0x0000000000403571 in main (argc=2, argv=0x7fff8ad96dd8) at nomos.c:479
mcjaeger commented 9 years ago

At test execution (make test), I have expreinced crash of nomos with signal 11 when the LastGoodNomosTest comes accross the file /testdata/NomosTestfiles/No_license_found/utf-8-bom.html However, Steffen does not have the issue. Anyone else? We have byte compared also the *.html file and cannot see differences, neither in the code. maybe the package dependencies are different? PS. actual output is: There was 1 failure:

1) NomosFunTest::testDiffNomos some lines of licenses are different, please view ./report.d for the details! Failed asserting that '0' matches expected '308'.

/var/lib/jenkins/jobs/FossologyNG.35.test.c/workspace/fossologyng/src/nomos/agent_tests/Functional/Nomos-fun-test.php:78

fogninid commented 9 years ago

no crash here, but clearly a buffer overflow at doctorBuffer_utils.c:345...

valgrind ./nomos ../agent_tests/testdata/NomosTestfiles/No_license_found/utf-8-bom.html 
==16262== Memcheck, a memory error detector
==16262== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==16262== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==16262== Command: ./nomos ../agent_tests/testdata/NomosTestfiles/No_license_found/utf-8-bom.html
==16262== 
==16262== Invalid read of size 1
==16262==    at 0x428990: convertWhitespaceToSpaceAndRemoveSpecialChars (doctorBuffer_utils.c:345)
==16262==    by 0x428BE6: doctorBuffer (doctorBuffer_utils.c:607)
==16262==    by 0x408EE5: findPhrase (parse.c:7584)
==16262==    by 0x40C693: parseLicenses (parse.c:972)
==16262==    by 0x4054EB: licenseScan (licenses.c:1083)
==16262==    by 0x4245F7: processRawSource (process.c:155)
==16262==    by 0x403E36: main (nomos.c:479)
==16262==  Address 0xb46c4c8 is 0 bytes after a block of size 40 alloc'd
==16262==    at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16262==    by 0x4256E0: copyString (util.c:525)
==16262==    by 0x408ECF: findPhrase (parse.c:7568)
==16262==    by 0x40C693: parseLicenses (parse.c:972)
==16262==    by 0x4054EB: licenseScan (licenses.c:1083)
==16262==    by 0x4245F7: processRawSource (process.c:155)
==16262==    by 0x403E36: main (nomos.c:479)
==16262== 
==16262== Invalid read of size 1
==16262==    at 0x4288F7: convertWhitespaceToSpaceAndRemoveSpecialChars (doctorBuffer_utils.c:393)
==16262==    by 0x428BE6: doctorBuffer (doctorBuffer_utils.c:607)
==16262==    by 0x408EE5: findPhrase (parse.c:7584)
==16262==    by 0x40C693: parseLicenses (parse.c:972)
==16262==    by 0x4054EB: licenseScan (licenses.c:1083)
==16262==    by 0x4245F7: processRawSource (process.c:155)
==16262==    by 0x403E36: main (nomos.c:479)
==16262==  Address 0xb46c4c9 is 1 bytes after a block of size 40 alloc'd
==16262==    at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16262==    by 0x4256E0: copyString (util.c:525)
==16262==    by 0x408ECF: findPhrase (parse.c:7568)
==16262==    by 0x40C693: parseLicenses (parse.c:972)
==16262==    by 0x4054EB: licenseScan (licenses.c:1083)
==16262==    by 0x4245F7: processRawSource (process.c:155)
==16262==    by 0x403E36: main (nomos.c:479)
==16262== 
File utf-8-bom.html contains license(s) No_license_found
==16262== 
==16262== HEAP SUMMARY:
==16262==     in use at exit: 1,241,158 bytes in 2,244 blocks
==16262==   total heap usage: 215,411 allocs, 213,167 frees, 42,004,160 bytes allocated
==16262== 
==16262== LEAK SUMMARY:
==16262==    definitely lost: 40 bytes in 1 blocks
==16262==    indirectly lost: 16 bytes in 1 blocks
==16262==      possibly lost: 0 bytes in 0 blocks
==16262==    still reachable: 1,241,102 bytes in 2,242 blocks
==16262==         suppressed: 0 bytes in 0 blocks
==16262== Rerun with --leak-check=full to see details of leaked memory
==16262== 
==16262== For counts of detected and suppressed errors, rerun with: -v
==16262== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
steffen-weber commented 9 years ago

With 199a516d38197cd84915c090028da1d0b7e40a34, the nomos test in travis work also for utf-8-bom.html. Can someone re-test please?

bobgob commented 9 years ago

Tested on #abb949f
I also included a number of other nonsource files. Looks good. Closing.