amitbl / blocktube

YouTube™ content blocker
GNU General Public License v3.0
907 stars 64 forks source link

Why Is the Behavior for Video Blocking Using Exact Maching---and Not String.Includes() ? #382

Closed ryanbarillosofficial closed 6 months ago

ryanbarillosofficial commented 7 months ago

In terms of blocking a video by channel name or video name, I find it not perform as I hope.

For example, I want to block the keyword wrestl to block anything related to wrestling.

Result: Though the content filtering works when searching "wrestl", it fails as soon as I try searching the following:

if (regexProps.includes(h) && properties.some(prop => prop && prop.test(value))) return true;

Based on this, the filtering calls for the user to mark down the the correct keyword 100% with proper capitalization commit to the blocking. However, I wonder if filtering as such may help:

if (regexProps.toLowerCase().includes(h.toLowerCase()) && . . . return true;

That may help with one issue of not filtering videos properly, but that may not work completely when all the other video titles mentioned above do have the keyword "wrestl", but they're not being filtered by the extension.

This, I don't understand why it's the case, and I hope this issue can be expanded upon for further review.

matubu commented 6 months ago

The addon supports regex in filter lists, so for your use case, you could use a regex like this: /\bwrestl/i.

Additionally, by default, when you don't use regex, the rule/searched word is case-insensitive, but it needs to match the whole word.

ryanbarillosofficial commented 6 months ago

by default, when you don't use regex, the rule/searched word is case-insensitive, but it needs to match the whole word.

I'll definitely re-learn RegEx to get my desired results. However, what do you mean by this?

matubu commented 6 months ago

For example, if you use the rule wrestl, it will hide the WrEstl, but it won't hide wrestler because it's not the exact word.

Is it matching the entire video title?

If you put your rule in Video title, it will check against the video title, and if your rule matches, it will hide the video. If you put it in Channel name, it will try to match against the channel name, etc.

ryanbarillosofficial commented 6 months ago

For example, if you use the rule wrestl, it will hide the WrEstl, but it won't hide wrestler because it's not the exact word.

Is it matching the entire video title?

If you put your rule in Video title, it will check against the video title, and if your rule matches, it will hide the video. If you put it in Channel name, it will try to match against the channel name, etc.

Thanks for the clarification!

Now this follows up my original question: Could there have been a simple method from the extension's code itself to where it will just use the standard string.includes() function in standard JavaScript to do the filtering instead of using RegEx?

matubu commented 6 months ago

You can use JavaScript functions to filter videos if you prefer. Never tried though, but I think it would look something like this:

video => video.title.toLowerCase().includes("wrestl")
ryanbarillosofficial commented 6 months ago

I currently have this rule in advanced blocking, but it doesn't work as I wish; some videos seep into the cracks:

(video, objectType) => {
  // Add custom conditions below  
  const title = ['wrestl', 'toy'];
  title.forEach((t) => {
    if (video.title.toLowerCase().includes(t)) return true;
  })
  // Custom conditions did not match, do not block
  return false;
}

One challenge I find with this is that the browser console spits out errors with this simple loop.

matubu commented 6 months ago

When you pass an arrow function to forEach, any return statement within it only affects the arrow function's return value and not the output value of the parent function. As a result, it's ignored by forEach, causing the parent function to consistently return false.

To address this issue, you'll need to employ a different approach that allows the function to exit correctly upon finding a match.

One solution is to use a for loop. With a for loop, you can directly return from the parent function when a match is found because you're not creating a new function or callback, enabling you to exit the function used for filtering:

(video, objectType) => {
  const titles = ['wrestl', 'toy'];
  for (let t of titles) {
    if (video.title.toLowerCase().includes(t)) return true;
  }
  return false;
}

Alternatively, you can utilize the some method. This method is specifically designed to determine if at least one element in an array meets a given condition:

(video, objectType) => {
  const titles = ['wrestl', 'toy'];
  return titles.some((t) => video.title.toLowerCase().includes(t));
}
ryanbarillosofficial commented 6 months ago

When you pass an arrow function to forEach, any return statement within it only affects the arrow function's return value and not the output value of the parent function. As a result, it's ignored by forEach, causing the parent function to consistently return false.

That makes sense. Thanks for your help! Honestly, I find this odd that this is not the default behavior of BlockTube, and this has to be implemented by the user.

ryanbarillosofficial commented 6 months ago

With some further additions, this rule becomes really excellent to use now!

(video, objectType) => {
  const titles = ['wrestl', 'wwe', 'mma', 'ufc', 'jim cornette', 'vince russo', 'tony khan'],
        channels=['wwe', 'wrestl', 'aew', 'jim cornette', 'vince russo', 'tony khan'];
  // Check if video title contains a keyword
  for (let t of titles) {
    if (video.title.toLowerCase().includes(t)) return true;
  }
  // If not, check if a video's channel name contains a keyword
  for (let c of channels) {
    if (video.channelName.toLowerCase().includes(c)) return true;
  }
  // Otherwise, show the video
  return false;
}
ryanbarillosofficial commented 6 months ago

Title has been renamed to hopefully ask the question I have in mind from the past few days.

matubu commented 6 months ago

I think it's simply to avoid getting too many false positives. For example, if you block the word word, it's likely that you don't want to block other words that have nothing to do with it but include the word, like in this case: afterword, sword, swordfish...

ryanbarillosofficial commented 6 months ago

I think it's simply to avoid getting too many false positives. For example, if you block the word word, it's likely that you don't want to block other words that have nothing to do with it but include the word, like in this case: afterword, sword, swordfish...

This makes sense. Thanks.