eclipse-platform / .github

Common contribution content for eclipse-platform repositories
https://www.eclipse.org/eclipse/
5 stars 10 forks source link

Stop shipping prebuilt indexes in doc bundles #130

Closed akurtakov closed 1 year ago

akurtakov commented 1 year ago

As identified in https://github.com/eclipse-platform/eclipse.platform.releng/issues/230 :

Thus the plan here is:

merks commented 1 year ago

Awesome. Thank you!

This is definitely still on one of my burners, as is the growing number of bundles that have minor updates:

https://github.com/merks/simrel-maven/blob/main/platform/REPORT.md

iloveeclipse commented 1 year ago

I believe this is too simplistic and only looks at the problem from the "Lucene version" point of view. The original idea to provide prebuilt indexes is to save time for first time use on large help bundles is still valid, independently of Lucene version used.

The problem is that Eclipse can be used to host huge help contents, where waiting for a index build is not an option. We (Advantest) have such Eclipse based help center, that hosts LOT of documentation, and I believe the proposal to get rid of prebuilt indexes would make it basically unusable. @howlger: would you please check?

So clear -1 from my side to original plan.

Instead of original plan that throw the baby out with the bathwater by deprecating/removing prebuilt indexes support, not just address "stop shipping prebuilt indexes in doc bundles" for SDK?

akurtakov commented 1 year ago

Why do you think it would make it unusable? It would be indexed only once and even when you ship prebuilt indexes you have to make sure they are all generated with same Lucene version or they will be regenerated at first access and used that. IIRC they are even used to generate a combined master index of all prebuilt and not content and used after that.

akurtakov commented 1 year ago

Regarding why not just "stop shipping prebuild indexes" - because with faster dependency updates these indexes just become unused and generated by Help center (Lucene) on the go without anyone care (except for the visible errors that shippped index is for old version). Let's see what @howlger will find out but I doubt there would be problem. AFAIK EF help center is not pregenerating indexes at all, right? @merks

laeubi commented 1 year ago

I think it would make more sense to have a way to index a product (currently we ship it with the bundle), similar to the -initialize argument of the launcher. So it would be good to have an IndexHelpApplication shipped with the helpcenter, then one can build the product, call the index app and then package the product.

merks commented 1 year ago

I wonder, is there anything that ensures that the version of Lucene used to generate the index is the same one that's actually in the target platform, which is presumably the one that the help provider is assuming/presuming will be used in the install installation? Sorry for my ignorance on the subject; I'm not really sure how the pre-indexing is actually implemented and in particular what determines the version of Lucene that does the indexing. Is that something that Tycho does?

Is there any empirical data that suggests not pre-indexing is a problem on today's modern machines with today's fine-tuned Lucene?

It's just seems odd to be preindexing anything and shipping that with the help bundle itself because in general one does not know the version of Lucene that will be in the final installation and many of us build helpful parts that will be shipped by 3rd parties...

merks commented 1 year ago

BTW, unfortunately I know nothing about the how the EF help center is implemented...

iloveeclipse commented 1 year ago

I wonder, is there anything that ensures that the version of Lucene used to generate the index is the same one that's actually in the target platform,

Sure. If we are talking about a product (not SDK), there is a clear fixed TP and fixed runtime that product uses and no Lucene issues whatever.

Once again: why stop providing a feature if one can simply stop ship indexes with SDK? The releng problem in SDK with different Lucene versions doesn't exist in custom products at all.

laeubi commented 1 year ago

Is that something that Tycho does?

I think that's what we are talking about: https://github.com/eclipse-platform/eclipse.platform.common/blob/e551660a73ead6ce00cf9d49b845328d6f9c4faa/pom.xml#L72-L87

what is implemented here: https://github.com/eclipse-tycho/tycho/blob/master/tycho-extras/tycho-document-bundle-plugin/src/main/java/org/eclipse/tycho/extras/docbundle/BuildHelpIndexMojo.java

this uses a buildToolsRepository (usually an eclipse release) where it resolves the application bundles from, I think it would technically be possible to reference a build product here as well.

It's just seems odd to be preindexing anything and shipping that with the help bundle itself because in general one does not know the version of Lucene that will be in the final installation

Given that one builds a product that can't modify the install the version of lucene is known in advance as it is usually pulled in by the target platform.

iloveeclipse commented 1 year ago

Why do you think it would make it unusable? It would be indexed only once

We ship a HUGE documentation package because we have a HUGE system with LOT of things we document. If the user has to wait for index some minutes it will be unusable. Go explain him "please just wait for 10 minutes and ask same question again please if you still knows what you wanted to search".

merks commented 1 year ago

I still don't quite see a definitive answer that says yes when Tycho builds this

https://github.com/eclipse-platform/eclipse.platform.common/blob/e551660a73ead6ce00cf9d49b845328d6f9c4faa/pom.xml#L72-L87

it uses precisely this version of Lucene defined here:

https://github.com/eclipse-platform/eclipse.platform.releng.aggregator/blob/9853a21f4f270cc5ad0da15cfd22236c3e4d05b1/eclipse.platform.releng.prereqs.sdk/eclipse-sdk-prereqs.target#L29-L34

I think more to the contrary that @laeubi is saying Tycho uses a version that does not depend on what's in the current target platform and @iloveeclipse that's why it's quite likely that the index that is built will not actually work with the Lucene that's shipped because it's likely that we build with 9.4.0 but will ship 9.5.0 or 9.6.0. I'm not sure what you are doing in your products that ensures this is not the case. I guess that's none of my business though...

I do tend to agree with @iloveeclipse that if there are people out there that are ensuring that the indexing Lucene and the consuming Lucene are the same version why shouldn't they be able to continue using that? I.e., why not just stop doing it ourselves?

howlger commented 1 year ago

The problem is that Eclipse can be used to host huge help contents, where waiting for a index build is not an option. We (Advantest) have such Eclipse based help center, that hosts LOT of documentation, and I believe the proposal to get rid of prebuilt indexes would make it basically unusable. @howlger: would you please check?

Yes, unfortunately it would not work for Advantest without pre-built indexes. Indexing from scratch would take far too long due to the large number of documents (several minutes, in some cases more than half an hour).

Some notes:

laeubi commented 1 year ago

I still don't quite see a definitive answer that says yes when Tycho builds this ... it uses precisely this version of Lucene defined here

I never claimed that :-)

As nothing is configured as far as I can see, Tycho uses the default what is https://download.eclipse.org/releases/2023-03/ so whatever lucene version is there will be used, but of course one can configure any other update-site to use and it will resolve the highest version the help-system constraints allows.

So given that I build my product with a given Eclipse Release and configure this in the target and the mojo they will likely use the same lucene version ... the main problem with Eclipse Build is that everything is mangled in one big build instead of independent builds with clear versioned dependencies, most products simply don't work that way so one should not assume this being "standard".

akurtakov commented 1 year ago

@howlger Can you share numbers from your tests how much indexing in the help system (without prebuilt indexes) takes for all the docs in your product?

akurtakov commented 1 year ago

At least for now there is agreement to stop prebuilding indexes for the SDK which would already be an improvement so I would act on that.

iloveeclipse commented 1 year ago

so I would act on that.

I would then change the title of this ticket to Stop prebuilding indexes for the SDK, otherwise it is says exact the one thing we won't do.

akurtakov commented 1 year ago

I would still wait a bit for some numbers as I am not yet convinced indexing can take that much time.

iloveeclipse commented 1 year ago

I would still wait a bit for some numbers

15 minutes should be a good reason?

akurtakov commented 1 year ago

For sure. What I want to know is how much indexing (if any) takes with prebuilt indexes and without on the first run for you if possible to share.

iloveeclipse commented 1 year ago

That is 15+ minutes without prebuilt indexes.

iloveeclipse commented 1 year ago

I'm not sure why do you eagerly want to remove a useful platform feature were the users clear and loud say that the feature is in use and without it the product will have a major problem? How many times should we confirm that it is is use and we really really need that?

akurtakov commented 1 year ago

I fully understood you use that and I'm not eager to remove a useful feature, what I'm eager to achieve is make daily maintenance simpler and fully understanding how/why/when something happens is crucial for that. Second part of my question was do you see indexing (aka merging of indexes) happening on the first run with prebuilt indexes and if yes - how much time does it take.

iloveeclipse commented 1 year ago

Here the data from my colleague (thanks Harald!) who maintains our dedicated help product that serves as offline documentation package for our main product that allows customers create & run semiconductor tests.

Quote taken literally from our internal tracker where I've requested to provide the data:


The documentation package for one application version

Customers have often several versions of our application preinstalled creating the initial index without the pre-built index would take them n-times ~30 min.

The help packages that come with Eclipse like the platform guide are very small (org.eclipse.platform.doc.user_4.27.0.*.jar is 4MB) and there is no need for small guides to provide a pre-built help index as it can be created on the fly is a few seconds. I guess this is where the discussion comes from. With bigger documentation packages it is completely different, we depend on the pre-built help index to provide a usable help solution to our customers.


So is this enough data now to stop trying deprecate things that are essential for customers?

akurtakov commented 1 year ago

Thanks for the info! You seem to be quite lucky to have such control over the installed things to manage to get this optimization.

iloveeclipse commented 1 year ago

You seem to be quite lucky to have such control over the installed things

We are not lucky. We just do our job right.

merks commented 1 year ago

@iloveeclipse

Thanks for the excellent detailed empirical data!

howlger commented 1 year ago

Thanks a lot for not deprecating the extension point!

Thanks @iloveeclipse/@hhermann for the numbers which shows how pretty well the help system scales and how much Advantest relies on it.

Special thanks to @akurtakov for maintaining this for several years now. You are doing an amazing job! We at Advantest appreciate this very much. I wish I could contribute more (btw Advantest is still looking for someone to work with @hhermann and me on this and other things).

iloveeclipse commented 1 year ago

@akurtakov : there are 6 fails after the PR on each platform, see for example https://download.eclipse.org/eclipse/downloads/drops4/I20230613-1800/testresults/html/org.eclipse.ua.tests.doc_ep429I-unit-cen64-gtk3-java17_linux.gtk.x86_64_17.html

Could you please check this?

akurtakov commented 1 year ago

Last test failure comes from https://github.com/eclipse-platform/eclipse.platform.common/commit/988e87e7683fb7cb2cab9ca20bb66d38b5ef4cc6 where broken link to Platform.php has been added but Help system can't handle php at all. I would need help ( @alshamams ? ) to point me where this link was supposed to point to at all.

alshamams commented 1 year ago

@akurtakov The content in the eclipse.platform.common repo is copied intelligently from the news repo by running certain scripts. I currently do not have a mechanism to convert php to html format.

merks commented 1 year ago

@alshamams

see https://github.com/eclipse-platform/eclipse.platform.ua/pull/151#issuecomment-1591312237 for what should be linked

in the original content the php is just kind of a simple wrapper so no need to convert it.

akurtakov commented 1 year ago

@alshamams Issue is fixed now via https://github.com/eclipse-platform/eclipse.platform.common/commit/8f15e5740f2fb13c1421fcb41144ad37eb5f6de6 , please fix the scripts coping things to do the proper thing.

I consider this one done.