Open tdruez opened 3 years ago
Hello @tdruez I wanna work on this issue, so can you explain a bit more that where this code file is and what's the exact issue if the result is right?
@tdruez general observation:
$ ./scancode -e --json-pp - input.txt
input.txt:
Files: lib/vtls/schannel.*
Copyright: 2012-2014, Marc Hoersken <info@marc-hoersken.de>
2012, Mark Salisbury <mark.salisbury@hp.com>
2012-2015, Daniel Stenberg <daniel@haxx.se>
License: curl
Files: lib/vtls/darwinssl.*
Copyright: 2012-2014, Nick Zitzmann <nickzman@gmail.com>
2012-2015, Daniel Stenberg <daniel@haxx.se>
License: curl
Files: lib/vtls/darwinssl.*
Copyright: 2012-2014, Nick Zitzmann <nickzman@gmail.com>
2012-2015, Daniel Stenberg <daniel@haxx.se>
License: curl
Result:
"emails": [
{
"email": "info@marc-hoersken.de",
"start_line": 2,
"end_line": 2
},
{
"email": "mark.salisbury@hp.com",
"start_line": 3,
"end_line": 3
},
{
"email": "daniel@haxx.se",
"start_line": 4,
"end_line": 4
},
{
"email": "nickzman@gmail.com",
"start_line": 8,
"end_line": 8
}
],
"scan_errors": []
So whenever email has been repeated, it has listed only that had been occured first. @pombredanne Is this bug or a feature?
Hello @tdruez I wanna work on this issue, so can you explain a bit more that where this code file is and what's the exact issue if the result is right?
@Ayushsunny You can get code in /src/cluecode/
@itssingh re:
Is this bug or a feature?
a bit of both.... there are two sides:
unique
Flag in https://github.com/nexB/scancode-toolkit/blob/96c73a2761eee3c1d8ba57c47efaa475f7459409/src/cluecode/finder.py#L127 which should likely not be there by default OR might need to be exposed in the CLI as an option (See for URLs https://github.com/nexB/scancode-toolkit/blob/96c73a2761eee3c1d8ba57c47efaa475f7459409/src/cluecode/finder.py#L200 )
When scanning the following text, the detection of
daniel@haxx.se
is only returned once in the results while it appears multiple times in the file.scancode -ce --json-pp -
Results: