apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.62k stars 1.02k forks source link

clean up FST storage abstractions [LUCENE-4593] #5658

Open asfimport opened 11 years ago

asfimport commented 11 years ago

I was looking at James patch for #4371, and I thought that you know, FST almost abstracts its underlying "i/o" (storage) via reader/writer abstractions.

It would be good to try to work on this more, e.g. we can imagine a little abstraction like lucene has a Store (Directory).

This way maybe we could cleanup the packed vs non-packed, allow for > 2GB fsts without slowing down small ones, and so on.

I have a patch that is like an amoeba-step towards this


Migrated from LUCENE-4593 by Robert Muir (@rmuir), updated Jan 08 2013 Attachments: LUCENE-4593.patch (versions: 2) Sub-tasks:

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

+1, this is a good amoeba step!

I think this would be a useful abstraction.

Eg maybe we could write directly to disk ... or, improve the RAM buffering to use growing/appending/paged buffers instead of one massive byte[] (which causes huge RAM spikes when we do ArrayUtil.grow) ... actually once we fix RAMFile it could just use that.

asfimport commented 11 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I actually noticed the spike stuff in finish() too.

because thats where we currently take the whole grow()'ed byte[] used during construction and shrink it to the actual necessary size we need. We are doing this anyway, so we could just use something else for intermediate buffering instead.

One confusing thing is that FST is like an immutable concept from the outside, but from the code on the inside its mutable. I really wish the buffering and stuff was instead encapsulated in Builder or somewhere else so that FST was simpler and immutable.

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I really wish the buffering and stuff was instead encapsulated in Builder or somewhere else so that FST was simpler and immutable.

+1

We now use the same class for writing as for reading, which is very confusing.

asfimport commented 11 years ago

Commit Tag Bot (migrated from JIRA)

[trunk commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1420014

LUCENE-4593: first step towards FST storage abstraction

asfimport commented 11 years ago

Commit Tag Bot (migrated from JIRA)

[branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1420017

LUCENE-4593: first step towards FST storage abstraction

asfimport commented 11 years ago

Commit Tag Bot (migrated from JIRA)

[branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1430334

LUCENE-4593: clean up how FST saves/loads the empty string output

asfimport commented 11 years ago

Commit Tag Bot (migrated from JIRA)

[trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1430333

LUCENE-4593: clean up how FST saves/loads the empty string output

asfimport commented 11 years ago

Commit Tag Bot (migrated from JIRA)

[branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1430342

LUCENE-4593: remove bogus true ||

asfimport commented 11 years ago

Commit Tag Bot (migrated from JIRA)

[trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1430341

LUCENE-4593: remove bogus true ||

asfimport commented 11 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Minor improvements, but an API change for the uber-Builder-ctor (the API is experimental): I changed allowArrayArcs from setter to ctor param (it doesn't make sense to change this while you are building).

Also added comment for lastFrozenNode ...

asfimport commented 11 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Nuke this setter!

asfimport commented 11 years ago

Commit Tag Bot (migrated from JIRA)

[trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1430477

LUCENE-4593: move allowArrayArcs to ctor

asfimport commented 11 years ago

Commit Tag Bot (migrated from JIRA)

[branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1430480

LUCENE-4593: move allowArrayArcs to ctor