andymeneely / chromium-history

Scripts and data related Chromium's history
11 stars 4 forks source link

Filter out (right): in Messages #229

Closed andymeneely closed 9 years ago

andymeneely commented 9 years ago

The word "rightli" is very common because of lines like this:

https://codereview.chromium.org/2413

http://codereview.chromium.org/2413/diff/513/645
File sandbox/src/sandbox_nt_util.cc (right):

http://codereview.chromium.org/2413/diff/513/645#newcode345
Line 345: if ((NULL == module_path) || (NULL == module_path->Buffer))
nit: if (!module_path || !module_path->Buffer)

http://codereview.chromium.org/2413/diff/513/645#newcode359
Line 359: if ((NULL == sep) || (ix == start_pos))
nit: !sep

http://codereview.chromium.org/2413/diff/513/645#newcode362
Line 362: size_t size_bytes = (start_pos - ix + 1) * sizeof(wchar_t);
nit: comment that 1 is for the extra null that you'll add.

http://codereview.chromium.org/2413/diff/513/645#newcode364
Line 364: if (NULL == str_buffer)
nit: !str_buffer

http://codereview.chromium.org/2413/diff/513/645#newcode369
Line 369: out_string->Length = 0;
nit: it is more clear if you set this to size_bytes - sizeof(char_t) and get rid
of copy_bytes.

Filter out the entire line of File blahblah (right):

Also, look at the Line lines as another to filter out.

bspates commented 9 years ago

@andymeneely
After doing some research it looks like all comments are repeated in the messages. When adding inline comments you have to commit them at the end like git with a commit message. The message becomes a reitveld message with the line numbers attached to each inline comment. I think this means we can stop parsing comments as well as messages and just parse the messages. Unless you can think of a reason we would need the duplicate data?

bspates commented 9 years ago

Just need to verify this works in production build

andymeneely commented 9 years ago

I thought we had found some counterexamples where it was just a comment and not a message? I'll take a look now.

andymeneely commented 9 years ago

Nope, you're right. I just went through a bunch of random issues with comments and they all were double-counted. Is it possible to still parse messages properly and then just not parse the comments?

bspates commented 9 years ago

I removed the parsing of comments from the vocab generation. Do you want to kill the comment table entirely?

andymeneely commented 9 years ago

Keep the table. We might use it for something else and it's not really slowing us down.