brackets-archive / bracketsIssues

Archive of issues in brackets.
0 stars 0 forks source link

Multiline regular expression selection support #13599

Open core-ai-bot opened 3 years ago

core-ai-bot commented 3 years ago

Issue by d-akara Saturday May 21, 2016 at 22:23 GMT Originally opened as https://github.com/adobe/brackets/pull/12442


12281

This change is a new implementation of the search cursor which can support multi line regular expression selections. Notable changes:

  1. Multi line selections are now supported
  2. The search cursor scans the entire document at once vs. the old implementation which was line by line.
  3. The new search cursor can be used by other extensions that wish to take advantage of the performance of using the internal document line index.
  4. Performance improvements
    1. A copy of the entire document is no longer created for every key press when typing in the search box.
    2. Next and previous navigation is much faster as it now just uses the cached locations.
    3. Performance is no longer strongly related to document size. It is now more related to the number of matches.
    4. For comparison, previous implementation would take 6 sec for 3 million matches. The new implementation can find 3 million matches in less than 300 ms.
    5. Increased the highlight limit to 10000 from 2000 as it seems we can now hightlight a greater number within the same time.
  5. Considerations
    1. Although performance is faster in most use cases, there is a trade off for very large numbers of matches. So with a very large document, using "." to match everything can create a pause on typing. We might want to limit total matches or require more than "." to be entered before starting search.

d-akara included the following code: https://github.com/adobe/brackets/pull/12442/commits

core-ai-bot commented 3 years ago

Comment by ficristo Monday May 23, 2016 at 08:28 GMT


Wow, number are really promising! Ooc, have you looked at improving the search cursor implementation directly in CodeMirror?

core-ai-bot commented 3 years ago

Comment by d-akara Monday May 23, 2016 at 12:27 GMT


It may be possible to improve CodeMirror's cursor in a compatible way; however, it will be more work to make it compatible with CodeMirror's use cases.

I made the new search cursor work the same as the Brackets search in all files, in that it converts plain string queries to regular expressions for consistency and allows use of same code path for strings and RegExp's.

CodeMirror will split multiline string queries at '\n' and has some special handling for those cases which also includes some special handling for code folding when it does splitting. Also, the consideration I mentioned above about performance for extremely large matches could affect CodeMirror. CodeMirror's search cursor does not expose any feature to limit the number of matches. This might prove necessary for that performance use case and would affect other consumers of CodeMirror's search cursor requiring possible changes on their part.

core-ai-bot commented 3 years ago

Comment by Denisov21 Thursday May 26, 2016 at 15:12 GMT


Great job@dakaraphi !

core-ai-bot commented 3 years ago

Comment by nethip Friday May 27, 2016 at 08:24 GMT


Awesome!

core-ai-bot commented 3 years ago

Comment by busykai Wednesday Jun 08, 2016 at 19:51 GMT


@dakaraphi: an idea for your performance concern with all-inclusive regexps. For files larger than a certain size, we can take a small window of text (e.g. pos+2KB) and scan only it for the matches until "match density" (in my example it would be 2KB divided by number of matches) reaches a "reasonable" number as the user types in. Once it does, entire document is scanned on each regular expression edit by the user. If the file itself is smaller than, say 100KB, it's just scanned entirely.

core-ai-bot commented 3 years ago

Comment by d-akara Wednesday Jun 08, 2016 at 23:29 GMT


@busykai I think the size isn't strongly correlated to performance as it was before. That being said, some others should give it a try as my machine may not represent typical performance. If the following idea on something like match density works out well, then size will probably be irrelevant for the most part.

I was thinking about something similar to your idea of match density. I will look into it further and see how it performs.

core-ai-bot commented 3 years ago

Comment by d-akara Friday Jun 24, 2016 at 00:25 GMT


@busykai Made numerous changes to address large documents:

Notable changes:

  1. Removed limit for document size. We are now able to effectively search anything Brackets can open.
  2. Set default match limit to 1 million. I looked at possible progressive algorithms as mentioned before, but 1 million is still responsive and seems more than reasonable as it is far past the capabilities of the other Node built desktop editors at the moment. If there are more than 1 million possible matches, the matches captured will start at the cursor position.
  3. Highlighting now enabled up to 1 million matches. This is now possible due to dynamic highlighting. For large numbers of matches, we now dynamically highlight the viewport area on scroll.
  4. Adding scroll ticks could cause some typing lag when there were many. This is now asynchronous and debounced for more responsive typing in the search field.
  5. For large documents, we additionally debounce some search tasks for more responsive feel.
  6. Fast visual feedback in editor. The initial visual feedback of highlighting is now virtually the same speed regardless of the size of document. This is done by initially highlighting any matches that are in the viewport only before performing the full search.

Note, I performed most of these tests on a 14MB document. It is noteworthy that other editors either can't perform or even crash when attempting searches on the same scale.

core-ai-bot commented 3 years ago

Comment by nethip Monday Jul 18, 2016 at 05:08 GMT


@busykai Did you get a chance to review this change? This would be a great addition to Brackets!

core-ai-bot commented 3 years ago

Comment by busykai Monday Jul 18, 2016 at 15:45 GMT


@dakaraphi,@nethip actually, I started to look at it this weekend. I'll be looking at it throughout this week. The feel with the change is just awesome though!

core-ai-bot commented 3 years ago

Comment by busykai Thursday Jul 21, 2016 at 23:56 GMT


@dakaraphi, there are a couple of corner cases where I don't feel like the UI is intuitive:

  1. When searching for $\n:
screen shot 2016-07-21 at 16 30 48

Note that there are no search highlighting for the match itself, but all the scrollbar is ticked.

  1. When everything matches (e.g. .):
screen shot 2016-07-21 at 16 31 18

Note that each match is highlighted, but no ticks on the scrollbar. I understand the difference is due to FIND_HIGHLIGHT_MAX and FIND_SCROLLTICK_MAX are treated as mutually exclusive, but they should not be, probably.

I also got few general comments on the function documentation:

  1. In many places the function documentation uses verbs in the imperative as opposed to singular 3rd person (i.e. Create instead of Creates). Function documentation should start with a verb conjugated in 3rd person singular.
  2. Template placeholders for argument types are left over in many places.
  3. Argument description is omitted.
  4. Sentences and paragraphs should start with a capital letter.

I'll leave these out from the inline comments for now, just to be less verbose. Please make a re-pass over the docs.

core-ai-bot commented 3 years ago

Comment by d-akara Friday Jul 22, 2016 at 02:51 GMT


@busykai

  1. I'll look into the case for $\n
  2. I believe the highlight and tick behavior have not changed. Just the amount of highlights available. We previously would drop the tick marks when the number of highlights were too many. I'm happy to try something different, but I don't know what that should be. The main issues are:
    1. I'm not sure if more than 2000 tickmarks is meaningful. The bar just becomes saturated.
    2. The tickmarks cause a significant performance hit. Being able to handle more would require addressing the performance first if that's what we want.
  3. I'll work on making the recommended doc changes.
core-ai-bot commented 3 years ago

Comment by d-akara Saturday Jul 23, 2016 at 21:45 GMT


@busykai

  1. The issue for \n is that codemirror's markText does not support marking regions without actual text. I searched around a bit, but there appears no obvious solution. Ideally, markText would support this as a property flag.
  2. I think I now have a way of possibly supporting tickmarks efficiently for any number. Hopefully will be able to try this out soon.
core-ai-bot commented 3 years ago

Comment by busykai Sunday Jul 24, 2016 at 23:04 GMT


@dakaraphi, thanks! I assumed CM did not support the newline selection, but since I'm not an expert on it, I was not sure. As for 2), I have a lots of code yet in this PR yet to be reviewed and I assume the improvement you mentioned will be somewhat localized. I'll continue with the review and will try your change out as soon as you push it.

core-ai-bot commented 3 years ago

Comment by d-akara Tuesday Jul 26, 2016 at 01:00 GMT


@busykai

  1. In previous commit, I had inadvertently commented out the function call which does the early render in the viewport. I fixed this, so for large documents the initial highlighting should be faster for what is currently in the viewport.
  2. Tick marks no longer have a limit on the amount we can display. I implemented something similar to video games that allow a LOD (level of detail) setting.
    1. The current LOD is set to 1000. So you can have up to 1000 physically rendered tickmarks.
    2. If there are more than 1000 actual line matches, the tickmarks are representative of multiple lines. This allows the tickmarks to still display a visual pattern of matches regardless of the number.
  3. Documentation is still work in progress. I made some updates. Although, I'm not exactly sure I have the syntax correct in some cases. Will follow up later with more detail questions.
core-ai-bot commented 3 years ago

Comment by busykai Friday Jul 29, 2016 at 04:27 GMT


@dakaraphi thanks for addressing the issues, I will do my best to do another pass over the weekend. One general question I had: do we really need to keep two flows for "small" and "large" documents, in your opinion? Wouldn't it make the code simpler if we just had single (using debouncing for all the cases) approach?

core-ai-bot commented 3 years ago

Comment by d-akara Friday Jul 29, 2016 at 14:36 GMT


@busykai I probably type quite a bit faster than 200 characters per minute. I generally was thinking that the faster you type, the more noticeable and annoying would be any lagging feel.

So I chose the current values through a series of simply trying it and settling on values that seem to be a good compromise between feel of typing and visual responsiveness.

I'm not exactly sure what you are describing in 'growing delays'.

I considered having only one flow for "small" and "large" and I am still open to that as I agree I would normally prefer keeping the code simpler. However, I also didn't want to regress in any noticeable way of the responsive feel. It is subtle, but there is faster visual feedback. Also the same is true for the dynamic highlighting, would prefer doing it only one way; however, for small number of matches if you scroll very fast the static highlighting will look better.

core-ai-bot commented 3 years ago

Comment by busykai Friday Jul 29, 2016 at 20:00 GMT


I'm not exactly sure what you are describing in 'growing delays'.

You have longer debouncing delays on some operations (0 for performing the search and 300 ms for ticks, for example). I was wondering what you were trying to accomplish with the difference as opposed to debouncing them all to a fixed small interval.

core-ai-bot commented 3 years ago

Comment by d-akara Friday Jul 29, 2016 at 22:22 GMT


@busykai So the timings are different in that each operation seems to have a different impact. For example, ticks are generally expensive. So I went with having this occur less often by having a higher time.

The debouncing of search of 0 of large documents is to allow the search to start as soon as possible after the highlighting of the viewport. Without debouncing the search in this flow, the early highlighting of the viewport will not occur as it waits for the search of the full document to also occur. So this is done so that the event queue empties allowing the viewport highlighting to be applied first then perform the search of the whole document.

core-ai-bot commented 3 years ago

Comment by busykai Monday Aug 01, 2016 at 03:05 GMT


@dakaraphi

Without debouncing the search in this flow, the early highlighting of the viewport will not occur as it waits for the search of the full document to also occur.

This is why asked. It does sound that the definition of "expensive" is relative. If search causes visual impact, then independently of when it's happen (synchronously or immediately after), it will take the same amount of time. So why not give it a chance to cancel: the next character put in by the user will render the results of the operation useless. Which gets us back to this point:

I probably type quite a bit faster than 200 characters per minute. I generally was thinking that the faster you type, the more noticeable and annoying would be any lagging feel.

I was not talking about your typing speed, but average (taken from google search "average typing speed"). I assume a programmer would type slightly faster than that. However, that's not the point. The point is: we need to give a chance for a search to cancel. If it has 0 debounce, it will start anyways and as the user types, the time will be spent doing all these searches (most of which will be useless) over the entire document. I think a 200ms bounce is a reasonable enough starting point.

core-ai-bot commented 3 years ago

Comment by busykai Monday Aug 01, 2016 at 04:49 GMT


General comment: we use closure jsdoc notation. In it, string datatype is string (lowercase). It seems to be needing correction in all occurrences.

core-ai-bot commented 3 years ago

Comment by busykai Monday Aug 01, 2016 at 05:32 GMT


@dakaraphi, I think I'm done with the first pass. I strongly recommend refactoring the datatypes in BracketsSearchCursor.js as per my comments. We used this pattern elsewhere (as opposed to lodash utilities and anonymous types).

core-ai-bot commented 3 years ago

Comment by nethip Wednesday Sep 07, 2016 at 11:40 GMT


@dakaraphi Just wanted to check if you got a chance to work on the review comments.

core-ai-bot commented 3 years ago

Comment by d-akara Friday Sep 09, 2016 at 16:08 GMT


@nethip Started, but have had much less time available recently. Hopeful to make progress this weekend.

core-ai-bot commented 3 years ago

Comment by nethip Friday Sep 09, 2016 at 16:21 GMT


@dakaraphi No problem! Take your time.

core-ai-bot commented 3 years ago

Comment by d-akara Saturday Sep 10, 2016 at 16:45 GMT


@busykai Next changes ready for review. BTW, after merging with latest on master, I can no longer run any unit tests. The window just hangs. Not sure what is the issue.

core-ai-bot commented 3 years ago

Comment by zaggino Saturday Sep 10, 2016 at 23:04 GMT


@dakaraphi for the test issues, try running npm install in brackets folder, if that doesn't help try getting latest brackets-shell master and rebuilding it

core-ai-bot commented 3 years ago

Comment by adityapaturi Monday Sep 19, 2016 at 13:42 GMT


@dakaraphi : The match case find is not working as expected. It is always returning the results as if match case button is not pressed.

core-ai-bot commented 3 years ago

Comment by nethip Monday Sep 19, 2016 at 13:45 GMT


@adityapaturi It would be great if you can point out the piece of code where the case matching is not working.

core-ai-bot commented 3 years ago

Comment by busykai Monday Sep 19, 2016 at 22:18 GMT


@dakaraphi, I'll review this by the end of the week. Thank you for taking your time to address the review comments!

core-ai-bot commented 3 years ago

Comment by d-akara Monday Sep 19, 2016 at 23:53 GMT


@adityapaturi This should now be fixed. Thanks.