Closed stephan-herrmann closed 1 week ago
Some notes on top of https://github.com/eclipse-mylyn/org.eclipse.mylyn/pull/562#issuecomment-2187555735 :
org.eclipse.mylyn.wikitext.util.Preconditions
org.eclipse.mylyn.wikitext.util.Strings
(caveat: null != empty != blank)org.eclipse.mylyn.wikitext.internal.util.XmlUtil
, see https://github.com/eclipse-mylyn/org.eclipse.mylyn/pull/562#issuecomment-2187555735org.eclipse.mylyn.wikitext.util.IDStrategies
generalized from my previous take at GfmIdGenerationStrategy.generateId()
toLowerCase()
, intentionally?org.eclipse.mylyn.wikitext.util.UrlUtil
- here be dragons, see belowtoString()
methods I let JDT propose a default implementation and manually fine tuned it to produce the output expected from testsString.repeat()
does not require any library :)PotentialEmphasisSpan.isWhitespace()
: caveat: whitespace != whitespace (Java vs. unicode)MarkdownDocumentBuilder.emitContent()
caveat: expects ' ' but no other whitespace, intentionally?For Url escaping, I had hoped that java.net.URLEncoder
would satisfy all requirements and needs, but alas, it escapes more than what we expect in CommonMarkSpecTest
.
org.eclipse.mylyn.wikitext.util.UrlUtil.escapeUrlFormParameters(String)
seems to be suitableorg.eclipse.mylyn.wikitext.util.UrlUtil.escapeUrlFragment(String)
heuristically tries to mimic the expected behaviour
ALLOWED_IN_FRAGMENT
captures legal chars, so we scan to find the first char needing an escape, but from that char onwards, we might escape more than desiredCommonMarkSpecTest
: [foo<http://example.com/?search=](uri)>
href="http://example.com/?search=%5D(uri)"
-- (
and )
expected unescaped but current impl will escape them :warning:@stephan-herrmann The ticket is supposed to remove guava references in wikitext, but it removes a lot of non-guava code as well which to me is outside the scope of the ticket.
It should only concentrate on guava, not muddy the waters with removing apache calls which does not need to be removed.
Should only remove guava uses. Should not remove uses of apache just because it is apache
I think "we" should reconsider that. As I understand it, the more 3rd party dependency one drags in, the more problematic it is to consume because more baggage is dragged in. One might argue the water is currently very muddy currently. And it's probably safe to assume @stephan-herrmann didn't do this for new reason at all...
The apache library is already in the build so I don't consider that adding a library, as opposed to adding something to the target file.
Either way, these kind of changes should be in their own issue, not one for guava
This will be the third time @stephan-herrmann is being asked to rework a contribution. I don't speak for him, but in general eventually folks tend to give up when hurdles appears in the way. I know Stephan is very persistent though...
One could just as well rename the ticket to change in scope:
Improve the consumablity and reusability of wikitext by reducing 3rd party dependencies.
I suppose it's none of my business. Somehow it feels poorly to me....
Removing guava is an very different beast than replacing third party library calls with "inline"/copied equivalents (and when I suggested importing the guava classes into wikitext for the same reason I was shot down along the lines of it creating a maintenance issue tracking the original version).
And as @stephan-herrmann points out in the description, there are already issues with the changes and I see several other issues already.
I could go along with a new ticket for replacing the apache uses (missing Strings.repeat() was a 'my bad' on my part) but I have bad feelings about making this issue a catch all one.
Yes, I understand your concerns about scope creating and have even more things to review and more risk subtle behavior changes.
Thanks, @gnl42 and @merks for commenting.
First things first: we have a definite need for improvement around UrlUtil.escapeUrlFragment()
. The current implementation does not pass one particular test of CommonMarkSpecTest
(see above for disablement).
UrlEscapers.urlFragmentEscaper().escape(link)
is used to encode entire URIs, not just a URL fragment. Can anyone explain this?URL
to first parse the string into parts, which can then be encoded with their specific rules?Since I'm not an expert in web-programming, I'd highly welcome some input here.
As for replacing usage of apache utils:
Validate.isTrue()
changed some IllegalArgumentException
to IllegalStateException
, which I thought best to revert immediately, rather than later.org.eclipse.mylyn.wikitext
I only reverted some additions from #565Asking pragmatically, which road poses least obstacles?
StringUtils.repeat()
I guess :) ).Is there any specific problem with any of my replacements of apache utilities, or is the concern only about the volume of the change? I don't recall any of the apache replacements as particularly tricky. Changes to toString()
methods have a high volume, but very low complexity. @gnl42 which parts are you worried about?
And please don't let the discussion about apache distract us from the necessary work around URL escaping!
Thank you for your efforts and for your optimistic attitude @stephan-herrmann ! Let me add my two cents to this discussion.
1) The scope of the task to remove guava is big enough and we shouldn't extend it to other topics if we want to converge within a reasonable time. So, if we want to discuss apache replacement or troubles with particular guava type replacement, let's postpone questionable items for other PRs so that we can merge the rest of the work. I believe that this is how we can make faster progress.
2) Nobody says that we must literally reuse existing Mylyn Docs API/implementation in its current state. We may introduce other bundle(s) to be clean from 3rd party libraries and have sufficient interface/implementation to be suitable for JDT , LSP4E and other potential consumers. In this case tasks to remove guava, apache, etc are no longer coupled with a task to have reusable markdown support. So, instead of discussing how to replace the particular 3rd party type we can focus on defining a good interface for markdown parsing suitable for JDT needs. If you think this would be helpful you are very welcome to create a dedicated issue to discuss expected markdown interface capabilities @stephan-herrmann
add commons.lang3 to eclipse-sdk-prereqs
@merks with your experience from simrel do you want to comment on the feasibility of this?
Nobody says that we must literally reuse existing Mylyn Docs API/implementation in its current state. We may introduce other bundle(s) to be clean from 3rd party libraries and have sufficient interface/implementation to be suitable for JDT , LSP4E and other potential consumers. In this case tasks to remove guava, apache, etc are no longer coupled with a task to have reusable markdown support. So, instead of discussing how to replace the particular 3rd party type we can focus on defining a good interface for markdown parsing suitable for JDT needs. If you think this would be helpful you are very welcome to create a dedicated issue to discuss expected markdown interface capabilities @stephan-herrmann
Thanks @ruspl-afed for considering the big picture. I haven't yet started to work on the actual interface from JDT, but from other use of wikitext I'm confident that just the core plugins should be useful without new API. That's why I'm focusing on these three bundles:
If you look at this PR, those 3 plugins are all I intended to modify (plus corresponding tests).
The only change outside that scope is in ScreenshotOnTimeoutRule.java where I was driven by my (overzealous?) removal of WikiToStringStyle.
toString()
!) really isn't rocket science and shouldn't require more then 10 sec to review.I foresee no problem at all adding this to the Platform's *.target
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.14.0</version>
<type>jar</type>
</dependency>
The following appears to be there for tests only:
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
<type>jar</type>
</dependency>
It's not actually in the p2 repository:
I foresee no problem at all adding this to the Platform's *.target
Thanks @merks, I really was afraid that would cause more trouble, but your statement is encouraging.
Please find in bb20f51 a proposed compromise where I reduced the surface area by reinstating some use of commons-lang3.
Note that some of my remaining changes were done before I even saw #565, where certain use of commons.lang3 was introduced. Specifically I believe it is useful to maintain the distinction between checkState vs. checkArgument (throwing different exceptions - as it was before #565), rather than mapping both to the same Validate.isTrue()
. (btw, why not org.eclipse.core.runtime.Assert.isTrue()
?). Also replacing guava's trivial isNullOrEmpty()
with a trivial same-named implementation of our own didn't look like causing pain for any reviewer. We need our own Strings
class anyway for replacements of CharMatcher
functionality like isBlank()
.
@gnl42 if you still see issues, please be specific, or: feel free to add more reverts to the PR.
At this point I can only repeat:
And please don't let the discussion about apache distract us from the necessary work around URL escaping!
I would have loved to spend time on that rather on reverting changes.
@stephan-herrmann I still think it should be two issues. but ... I don't understand the desire to remove the third party libraries but it sounds like it's something to do with the wikitext setup in which case I apologize. Still not happy about adding our own version of the library versions.
Thanks @gnl42 for approval.
Incidentally just now we found a leaner solution in orbit: org.commonmark_0.22.0.v20240316-0700.jar.
Unless we find some deficiencies in that library (which is unlikely, the javadoc reference implementation uses the same), JDT no longer needs mylyn wikitext for Java 23 support.
I hope this PR didn't stir up too much unnecessary dust, but perhaps it still helps other potential consumers like lsp4e. Feel free to modify the PR as needed.
Maybe, if so:
private static final Pattern NOT_XXX_NNN = Pattern.compile("[^a-z0-9_]"); //$NON-NLS-1$
private static final Pattern HYPHEN = Pattern.compile("-+"); //$NON-NLS-1$
private static final Pattern HYPHEN_START_END = Pattern.compile("^-|-$"); //$NON-NLS-1$
@Override
public String generateId(String id) {
id = id.toLowerCase(Locale.ENGLISH);
id = NOT_XXX_NNN.matcher(id).replaceAll("-"); //$NON-NLS-1$
id = HYPHEN.matcher(id).replaceAll("-"); //$NON-NLS-1$
id = HYPHEN_START_END.matcher(id).replaceAll(""); //$NON-NLS-1$
return id;
}
@gnl42
I've recently been doing performance tuning on a client application and have done changes exactly like this. It's less elegant and pretty, but definitely performs better!
Since JDT no longer needs this change and there are no special requirement for 3rd party libraries anymore from JDT side, please feel free to finalize and merge this PR @gnl42
I've recently been doing performance tuning on a client application and have done changes exactly like this. It's less elegant and pretty, but definitely performs better!
I compared the runtime of the various versions (10000000 calls each version):
Id="--e4bc9bef-0238-4f87-813d-f0b78bdb5c21--"
Guava took 443ns average per call
State machine took 286ns average per call
Regex took 1041ns average per call
Compiled regex took 788ns average per call
In the end it basically boils down to how often is the method called vs. clarity of code
There are still some references to guava in the pom files and friends.
Easier to create a mylyn branch to clean up
Also avoid new apache dependencies
Fully covers wikitext core
Tests:
fixes https://github.com/eclipse-mylyn/org.eclipse.mylyn/issues/521
Replaces #562 to build on #565