Implement editable blacklist / forgetlist -- work in progress

aquibm commented 7 years ago

Changes:

Adds a list of ignored sites to the settings page.
Users can add or remove sites
Sites are persisted in local storage for the time being
Preview: https://streamable.com/3sofe
Screenshot:

Todo:

[ ] Refactor components
[ ] Pull in material icons
[ ] Validate input for whitespace? correct regex?
[ ] Allow users to undo deletions

Fixes #20

Treora commented 7 years ago

Good start. I wonder what the reasons were for choosing (and switching) between redux-pouchdb and the storage API. If we are not happy with either could use whichever is easiest for now. I was recently looking at redux-persist, which may be useful here.

poltak commented 7 years ago

Hi guys. I'm looking to contribute to this feature/issue #20 to help out a bit. At the moment dev is being done on @aquibm's fork, and seems to be going well, but I don't have write access to add further commits to this PR. What do you suggest is the best way to go about this?

I could send a separate PR to @aquibm's master branch, which after being merged in would show up in this PR (kinda weird, and I would need to continually make sure I'm in-sync, but it would work I suppose :S). Or we could have a discussion about having mutual write access on a fork and make the PR from that?

aquibm commented 7 years ago

@poltak Thanks for helping out! I've added you as a collaborator on my fork, you should be able to update this PR now 👍

@Treora The switch was mainly to make it easier to read the list of blacklisted sites without having to setup a persistant store / reducer on the other end. We could also switch out the storage area type to be sync rather than local which is fairly limiting (~8kB / item) but would allow blacklist items and settings to be synced across instances of chrome.

poltak commented 7 years ago

What do you guys think about getting rid of that ADD button (top right) and just having the input row always shown? Just a suggestion (you could have other plans with it already).

I think it simplifies the possible states of this view, which is always nicer for the user, and we could make some style to visually re-enforce that its purpose is as input and not a data row (I'm terrible at coming up with visually-appealing styles; take the screenshot with a grain of salt).

screen shot 2017-04-14 at 18 34 59

Treora commented 7 years ago

@aquibm:

The switch was mainly to make it easier to read the list of blacklisted sites without having to setup a persistant store / reducer on the other end.

I see your point. I think we will anyway add redux state to the background process soon, which makes it a smaller step to take, but maybe we will be happy with this solution too. As said, I am happy to start with something and improve as we go. The important thing is to keep things modular enough to easily improve them later.

We could also switch out the storage area type to be sync rather than local which is fairly limiting (~8kB / item) but would allow blacklist items and settings to be synced across instances of chrome.

Remember that we do not only target Chrome (it seems Firefox support for sync storage will appear in v53 though). I would not invest much in using browser sync for settings though; more importantly I would like to access my memory contents itself from any device, but that will be another story.

@poltak: thanks for popping in and helping out!

I like the design example with the input visually mixed in as one of the entries. I actually wonder if the column with date added gives relevant information.

As a small heads-up: at some moment we may wish to configure more types of behaviour than just ignore/store (blacklist or not). For example, some sites I may wish to index the text content, while for others I want to also store the full page. Not sure if we will want all these 'site rules' in a single list, but it may be good to keep flexibility for extension in mind when coding this widget.

blackforestboi commented 7 years ago

@poltak

What do you guys think about getting rid of that ADD button (top right) and just having the input row always shown? Just a suggestion (you could have other plans with it already).

Indeed, I could imagine it simplifying the entry process, as it could remove a few steps of entering new items. Apart from the screencast, I haven't seen the complete work-flow of the current blacklist feature yet, but from what I understood it's the following. Correct?

Press add button,
new line would pop up
fill the field
press enter

What you, @poltak propose with removing the button, someone would just type in in the last row the regex and press enter?

If that is the case, indeed removing the add button would remove step 1 and 2. I think we could leave it out for now, if the current purpose is just adding a new line. Although for more options later on, the Add button could help.

@Treora

I actually wonder if the column with date added gives relevant information.

I would agree to that. What use case did you have in mind for that @aquibm?

Not sure if we will want all these 'site rules' in a single list, but it may be good to keep flexibility for extension in mind when coding this widget.

From my perspective, it think it is best to leave it simple for now with just a regex in a single list, so the basic feature works. I feels a bit like over engineering to optimise for more use cases than regex, as it covers already a good amount of them. (domain, url, expression) I can see the point of adding other use cases later on. Seems like the question "index or archive or ignore?" could be an additional option/list.

Treora commented 7 years ago

I feels a bit like over engineering to optimise for more use cases than regex, as it covers already a good amount of them. (domain, url, expression)

Not sure what you understood here. I meant that rules would still be regexes.

I can see the point of adding other use cases later on. Seems like the question "index or archive?" could be a separate list.

Could also be separate. Right now I tend to think they would make more sense together, as these are different answers to the same question (index, archive, or neither?). This was just one example though, more behaviour aspects may appear that people may wish to configure. Let's solve the problem when it is there though.

blackforestboi commented 7 years ago

Not sure what you understood here. I meant that rules would still be regexes.

I meant basically any other form of improvement except pure regex. As for example your index/archive/ignore option or things like whole domain (which then could be possible if you have the whole url and say "whole domain". An option which could extract the domain name and blacklist it. i.e. used in a quick blacklist function.

So yeah, lets solve it when its there.

blackforestboi commented 7 years ago

@poltak and me just had a chat in slack about how we could do it.

There was a question about if it makes sense to add simple logics (*, ?, ., etc) to the regex.

For the sake of simplicity, I think it is enough to just incorporate the "all urls containing the following string" use case and not think about regex rules. With this case people can blacklist a domain “mybank.com”, a path “mybank.com/dashboard” and a whole URL "mybank.com/dashboard/print.html". Also I see currently no other use case that would make regex necessary.

To not have to create a new work-flow of validating the spelling of the entry, a simple white-space removal would do the trick. At least there is currently no other use case that comes to my mind, that would make the entry unsuitable for enforcement. Any other that come to your mind?

Treora commented 7 years ago

@oliversauter:

There was a question about if it makes sense to add simple logics (*, ?, ., etc) to the regex.

How do you mean to 'add them to the regex'? Using '*' as a wildcard would conflict with regex semantics; better to drop regexes completely then, which is fine too.

For the sake of simplicity, I think it is enough to just incorporate the "all urls containing the following string" use case and not think about regex rules. With this case people can blacklist a domain “mybank.com”, a path “mybank.com/dashboard” and a whole URL "mybank.com/dashboard/print.html". Also I see currently no other use case that would make regex necessary.

We have been talking about using regex matching because it is expressive and easy to implement, but it would be totally fine to do some other type of matching. Some options:

Just literal substring matching (probably too crude).

Full string match, but interpreting a '*' as wildcard (easily understood, bit still somewhat crude).

Some URL-specific rule type, e.g. like url matching patterns in WebExtensions.

Regex (not so user-friendly, but easy to implement)

Choose whatever seems easy enough to implement good (easy) enough to use. If anyone wants to put time into it, a url-specific approach is probably best (is there prior art available for reuse?). Or at least, make it easy for users to just enter a domain name (e.g. mybank.com), and have it match both http and https.

To not have to create a new work-flow of validating the spelling of the entry, a simple white-space removal would do the trick. At least there is currently no other use case that comes to my mind, that would make the entry unsuitable for enforcement. Any other that come to your mind?

You mean spell-checking the domain names? :) If input validation is needed depends on which type of matching rules we use. Not sure if white-space has to be removed or just url-encoded like any special character.

blackforestboi commented 7 years ago

better to drop regexes completely then, which is fine too.

agreed.

Just literal substring matching (probably too crude). Full string match, but interpreting a '*' as wildcard

As mentioned, both of them would cover a good amount of use cases we need right now, even if crude. I agree that regex is not user friendly and also has no use case right now.

If there aren't any objections, Iets go with the simple substring matching as it is currently working in the worldbrain extension. We can still later improve (i.e. http/https check), but lets not waste time in the details of such specific use cases, before we dont have any data to support such a use case. Lets build something working first.

You mean spell-checking the domain names? :)

In that case it was rather meant for spelling in the sense of "are there characters that can't be used to match a URL", as for example a white space. (Any other characters that you have in mind? Chinese symbols? :) ) It seems as it would not make sense to encode a white space to match a url, as a literal whitespace never enters a URL. If he copy pastes a URL that encodes a white space, it's not a literal white space in the blacklist either. So if a user types in a white space in a blacklist field, then it's most likely an accident and thus should simply be removed.

poltak commented 7 years ago

In that case it was rather meant for spelling in the sense of "are there characters that can't be used to match a URL", as for example a white space. (Any other characters that you have in mind? Chinese symbols? :) )

URL validation seems to be a messy problem. Chinese characters are fine and even emoji are valid in URLs (all gets resolved to ASCII using punycode). Pasting 💩.ws into your URL bar should work in most browsers. So we should definitely allow special unicode characters.

Have a look at the "These URLs should fail" part of this matrix. My question is what should we do if the user puts in any of these (as an example)?

I think the most simple way to handle naughty input, without introducing a validation/error state, is just let them add whatever nonsense URLs like this and then later, in the part of the extension that is going to do the actual URL check against the ignorelist values, just ignore these values. Depending how the check will work, it may not even matter as something like //////// won't match a full URL or appear as a substring, so it'll just pass. There's probably some weird case though since there are so many possibilities...

poltak commented 7 years ago

Another question: If I enter and save test.com, then enter test.com again and try to save, should it check the existing list and not save, or just skip the check and duplicate it? Currently doesn't check and allows duplicates, but a check should be simple if needed

poltak commented 7 years ago

Preview of where it's at now: https://streamable.com/fcpbd Shows the auto-removal of spaces, disabled states of buttons, clear button, save event on save button click and enter press. The last bit, if not clear, is showing refusal to save when only whitespace is entered.

Also just realised an inconsistency in that gif: saving via enter key press keeps focus on the input, while saving via the save button click loses focus. Should probably choose one for consistency-sake (I'm for keeping focus; prevents swearing at the computer when trying to enter multiple things).

Just put the material iconfonts inside the webextension code and replaced the CDN link as well.

UI-wise, what more needs to be done here?

I can think of:

[x] change the input placeholder to reflect simple substring matching decision (maybe "Enter any text or domain to ignore matching websites"?)
[x] fix refocusing of input (always refocus?)
[ ] make a decision on how to handle duplicates (I think just refusing the save is simple enough; same as when trying to save only whitespace)
[x] make a decision on whether the "Added" column is needed
[ ] make deletes undoable

blackforestboi commented 7 years ago

change the input placeholder to reflect simple substring matching decision (maybe "Enter any text or domain to ignore matching websites"?)

We can also add some instructionary sentence above the list. Will think about one. For the input field, how about "Enter any text or domain or path to ignore matching URLs" ?

Fix refocusing of input (always refocus?)

+1 for refocus

make a decision on how to handle duplicates (I think just refusing the save is simple enough; same as when trying to save only whitespace)

+1 for making duplicates for now. I would file it as an improvement to make later.

make a decision on whether the "Added" column is needed

+1 for removing, doesn't add much value.

make deletes undoable

+1 for skipping and also filing for future improvements.

Treora commented 7 years ago

@poltak: Nice progress! I agree with @oliversauter's +4, no need to make everything perfect in the first iteration. I have been trying to take somewhat of a hands-off approach concerning the options page; so feel free to use your own judgement while developing, but give a shout if you would like some more eyes on your code.

A quick point already: designing for having multiple things that will be configurable, it would be neat to have the blacklist-related things grouped together (i.e. organise by feature).

Also, it looks like you tried to merge with master branch, but ended up also including some outdated (and some worldbrain-specific) commits from the in worldbrain's master branch. In general, you may want to use rebase instead of merge to get branches up to date again. I just rebased it to master, see branch blacklist. Feel free to reset your working branch to that one, or alternatively to work on that one instead (just invited you to repo contributors).

poltak commented 7 years ago

@Treora: Thanks for the feedback and pointers. Much appreciated. Yes, completely agree with the organise by feature design. Didn't want to alter the existing structure too much though, as I had just gotten on-board with this project, and felt these things should be discussed first before I went ahead and started refactoring stuff in this PR. Now that it's out there, I'll go ahead and organise it a bit more when I've worked through the main feature-related stuff I'm doing. Although, feel free to call me out if you think I'm diverging too much in any big changes I make in the future.

Also, it looks like you tried to merge with master branch, but ended up also including some outdated (and some worldbrain-specific) commits

Yeah the intention here was to get the wordbrain repo up-to-date with most of the changes it was missing upstream (this repo) while keeping all the UI changes that were since made in the wordbrain repo. However, I never intended that merge to come back to this PR...

Ended up doing that via an explicit merge as there were 3 main commits in the worldbrain repo that changed the overview UI, which ended up conflicting with a lot of changes from here that seemed to be related to fixing linting errors and refactoring. Dealing with resolving the conflicts in a single explicit merge commit seemed a lot less painful than rebasing the commits from worldbrain and resolving them one-by-one. But point taken, I'll try to stick to rebasing changes on top of branches to keep things up-to-date from now on.

Treora commented 7 years ago

to get the wordbrain repo up-to-date with most of the changes it was missing upstream (this repo) while keeping all the UI changes that were since made in the wordbrain repo

Check with Oliver, but as far as I know those UI changes are superseded as they were made for PR #82 (where he impractically made the PR from his master branch), which was replaced by (and partly incorporated in) PR #55.

In any case, to prevent headaches later on, I would really recommend to always rebase instead of merge if you want to keep branches and forks in line with each other. It is not trivial (no pun intended) to work on your own fork while still incorporating changes from upstream – assuming that is your intention (you may of course decide it is easier to just run your indepedent&disconnected project, though we would lose the synergy and would have to do many things twice).

poltak commented 7 years ago

@Treora I've refactored all blacklist code into its own feature module, based on the ideas in that link. Module entry point exposes:

main blacklist container (default export)
combined reducer (reducer export)
actions object (actions export)

No selectors in the code so far and haven't exposed the underlying view components, as I can't see a good reason to yet.

All in refactoring commit 41bd0b9fba290362009b62ca6e533ea2620cd807. See what you think of the general structure and we possibly can stick to something like this for all future feature modules, or at least UI code?

Also refactors the blacklist state from being located at state.settings to state.blacklist. Keeping each feature module in their own sub-key of state like this seems nicer than bundling everything that will appear under the "settings" nav item in state.settings key.

Furthermore, what do you want to do with this PR now that you made the blacklist branch? Either:

just close this PR and any further blacklist work to be done on blacklist branch (possibly another PR)
I reset this PR's branch before those two unintended merge commits and then rebase the newer stuff from blacklist branch on top of this

There is also some non-UI blacklist commits on WorldBrain implementing the actual blocking logic in import-history and activity-logger's background page tracker. Will bring them over either to this PR's branch or the blacklist branch, if wanted.

Treora commented 7 years ago

Great. Looks much better now, with the blacklist folder.

A few high level comments:

I would pull the main component out of index.jsx to keep things clear and separated; the task of the index file is just to import and export things.
I am not certain about the choice between state.blacklist and state.settings.blacklist.. It would be nice if settings is a clean module by itself, with blacklist as a submodule, so that the settings module is opaque to the top level app (the options page). I am not experienced enough with such modular Redux approaches however, and my feeling is that Redux is not really well designed for such clean module abstractions.. So I'll leave this one up to your judgement. If you leave it like this, the reducer and actions files from the settings folder could be removed.
configurePersistence is specific to the blacklist module, so logically it should go in the blacklist folder. It may not be worth the effort to modularise this neatly however, because when we have multiple settings modules we will probably want to persist all settings using a single generic approach instead.

Furthermore, what do you want to do with this PR now that you made the blacklist branch?

The blacklist branch is rebased on master, so better continue with that. Either as a new PR, or by doing a reset --hard blacklist on this (Aquib's) branch.

There is also some non-UI blacklist commits on WorldBrain implementing the actual blocking logic in import-history and activity-logger's background page tracker. Will bring them over either to this PR's branch or the blacklist branch, if wanted.

Nice; can be added, or can also be done as a separate PR to review and discuss things one step at a time.

Treora commented 7 years ago

This work was continued in #93, cleaning up this PR.

WebMemex / webmemex-extension

Implement editable blacklist / forgetlist -- work in progress #84