Directly use platform content type API for finding content type

gamerson commented 9 years ago

fixed #38

So the previous used model handler API which didn't work for adopters since not all contentTypes have their own model handlers. So we researched the IStructuredModel loader, how it gets its contentType, and it turns out all of the smarts can be found here: org.eclipse.wst.sse.core.internal.FileBufferModelManager.detectContentType(IFileBuffer)

If you look in that class you can see if has 8 or 9 different contentType lookup methods and failovers. However, when the content is an IFile, like it is with XML search's scenario, there are only 2 ways to find contentType for the file, lookup directly and one platform fallback.

So I've replicated that logic here in this new pull and no longer call into the ModelManager to get the content Type.

This doesn't break Liferay use-cases, but I hope it doesn't break any of your own extensions either, only testing will tell.

angelozerr commented 9 years ago

@gamerson it seems that your idea is very good. I have accepted your PR. We will see when we will use if we have problem with your fix. Thank's a lot!

Just one question, have you seen some memory improvment?

gamerson commented 9 years ago

So in the liferay portal source we have some XML files (javadoc) that have over 1,000,000 lines.

So it was these files that were crushing memory when XML search would try to check the content type. It would load the entire model which would load all of the ITextRegion[] etc so basically all the infrastructure for an Editor into memory. In our case to call getModelForRead() for a 1,000,000 line XML file was taking nearly 2GB of memory.

Now with this pull it never loads the whole model to finish the searcher without error.

However, if you had a searcher for content that would resolve into that javadoc file.... it seems XML search would crash again... as soon as it tried to call getModelForRead()... but in the same way.. double-clicking that large of a file to "edit" it would also crash eclipse.

But I think that users expect that "editing" a 1,000,000 line xml file would be slow. But they wouldn't want to break "global search".

The use-case where I broke it was this... right click a java class and say "references > workspace". So just a global search for all references to a java class. And since the web.xml searcher has Java -> xml reference extentsions, at somepoint if the content types match getModelForRead<) will be called by XML Search. In that case, is seems nothing can be done unless XML search can find another way for searching other than getModelForRead()

gamerson commented 9 years ago

So for now this fixes our problems. Right now all of our searchers like portlet.xml, liferay-hook.xml would never index into a 1,000,000 file. Also the built-in searchers for web.xml or jsps would also be ok.

So I think we are safe at a framwork level. If we every had a 1,000,000 line jsp or portlet.xml file, then things would break again... and the solution is not so straight forward I would think.

angelozerr / eclipse-wtp-xml-search

Directly use platform content type API for finding content type #40