apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.63k stars 1.02k forks source link

FSDirectory stuck at open(Path path) method when ran from .jar file #12968

Closed setokk closed 9 months ago

setokk commented 9 months ago

Description

Hello!

I use the FSDirectory.open(Path path) method when building the index.

When I run this method while not being in a jar, it works as intended. But as soon as I package into a .jar file, it gets stuck at FSDirectory.open(Path path).

I outputted the absolute paths of the files (dataFile and idxDir) and they were correct.

The weird thing is no exception is thrown. It just gets stuck. The method is called in the background from a SwingWorker but that shouldn't affect it really since it works well when I just run it from my IDE.

Not 100% sure if this is a bug but it seems weird that it just gets stuck and nothing is thrown while the file paths are correct.

Thanks for reading!

Here is my search engine init method:

public static void init(boolean clearIndex) {
        File dataFile = new File(System.getProperty("user.dir") + File.seperator + "data.txt");
        File idxDir = new File(System.getProperty("user.dir") + File.seperator + "index");

        // Set up analyzer
        SearchEngine.analyzer = new EnglishAnalyzer();

        try {
            System.out.println(idxDir.getAbsolutePath());
            System.out.println(dataFile.getAbsolutePath());
            SearchEngine.indexDir = FSDirectory.open(idxDir.toPath());

            if (clearIndex) {
                System.out.println("Rebuilding index...");

                IndexWriterConfig idxConfig = new IndexWriterConfig(SearchEngine.analyzer);
                idxConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
                IndexWriter idxWriter = new IndexWriter(indexDir, idxConfig);

                // Now, populate the index
                int docs = 0;
                JsonParser jParser = new JsonParser();

                for (String line : Files.readAllLines(dataFile.toPath(), StandardCharsets.UTF_8)) {
                    // On large amounts of data, this can take a while
                    if (docs % 10000 == 0) {
                        System.out.println(docs);
                    }
                    docs++;

                    // Parse JSON
                    // Each line of the input file is a serialized JSON object
                    Map j = jParser.parse(line);

                    // Get title
                    String title = (String) j.get("title");

                    // Get description
                    String ab = (String) j.get("abstract");

                    // Iterate through each author object and get "surname" and "given_names"
                    List<Map<String, String>> authors = (List<Map<String, String>>) j.get("authors");
                    StringBuilder authorsConcat = new StringBuilder();
                    String prefix = "";
                    for (Map<String, String> author : authors) {
                        String surname = author.get("surname");
                        String givenNames = author.get("given_names");

                        // Concatenate surname and given names to create author's full name
                        String fullName = prefix + givenNames + " " + surname;
                        prefix = ", ";
                        authorsConcat.append(fullName);
                    }

                    // Get pmid
                    String pmid = (String) j.get("pmid");

                    // Get biblio
                    Map biblio = (Map) j.get("biblio");

                    // Get biblio->year
                    String year = (String) biblio.get("year");
                    // Get biblio->volume
                    String volume = (String) biblio.get("volume");
                    // Get biblio->issue
                    String issue = (String) biblio.get("issue");
                    // Get biblio->fpage
                    String fpage = (String) biblio.get("fpage");
                    // Get biblio-lpage
                    String lpage = (String) biblio.get("lpage");

                    // Get journal
                    Map journal = (Map) biblio.get("journal");

                    // Get journal->title
                    String jTitle = (String) journal.get("title");
                    // Get journal->issn
                    String issn = (String) journal.get("issn");

                    // Indexed fields
                    Field tiField = new Field("title", title, TextField.TYPE_STORED);
                    Field abField = new Field("abstract", ab, TextField.TYPE_STORED);

                    // Not indexed fields
                    StoredField auField = new StoredField("authors", authorsConcat.toString());
                    StoredField pmidField = new StoredField("pmid", pmid);
                    StoredField volumeField = new StoredField("volume", volume);
                    StoredField issueField = new StoredField("issue", issue);
                    StoredField fpageField = new StoredField("fpage", fpage);
                    StoredField lpageField = new StoredField("lpage", lpage);
                    StoredField jTitleField = new StoredField("jTitle", jTitle);
                    StoredField issnField = new StoredField("issn", issn);

                    Document thisDoc = new Document();
                    thisDoc.add(tiField);
                    thisDoc.add(abField);
                    thisDoc.add(auField);
                    thisDoc.add(pmidField);
                    thisDoc.add(volumeField);
                    thisDoc.add(issueField);
                    thisDoc.add(fpageField);
                    thisDoc.add(lpageField);
                    thisDoc.add(jTitleField);
                    thisDoc.add(issnField);

                    idxWriter.addDocument(thisDoc);

                }

                System.out.println("Done!");
                System.out.println(docs + " documents indexed.");
                idxWriter.close();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

Version and environment details

OS: Windows 10, Arch Linux Lucene Version: Lucene 9.8.0 JDK Version: openjdk 21.0.1

bajibalu commented 9 months ago

Hi @setokk I am a new contributor to this repo. Unfortunately, I don't have both Windows 10 and Arch Linux. However, I ran the sample code above in the below setting and was able to see the output. It did not behave like you described above for me. Try to move the data and index location to the project folder and see if that resolves the issue.

OS: Debian 12 Lucene Version: Lucene 9.8.0 JDK Version: openjdk 21.0.1

Command java -classpath test.jar org.eample.Main

Output

/temp/index
/temp/data.txt
Dec 25, 2023 1:46:15 AM org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
INFO: Using MemorySegmentIndexInput with Java 21; to disable start with -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false
Rebuilding index...
0
Done!
1 documents indexed.
setokk commented 9 months ago

I tried but to no avail. I'm using maven for packaging the app in .jar.

Folder structure: image

Output: image

I mvn cleaned to see if there were any problems related to that but still the same output.

FSDirectory should create the index folder automatically.

uschindler commented 9 months ago

The problem is that your packaging as JAR does not preserve all required files.

See here for instructions: #12307, especially this comment: https://github.com/apache/lucene/issues/12307#issuecomment-1803320694

It looks like you suppress some exceptions. The problem is that it does not find all classes to support java 21 and/or the index codecs are not found.

Make sure that all META-INF files and the flag "Multi-Release: true" is part of you manifest.

uschindler commented 9 months ago

Most likely you see this issue due to broken Maven tooling: https://issues.apache.org/jira/browse/MSHADE-385

setokk commented 9 months ago

Thanks! Adding the line "Multi-Release: true" to the MANIFEST.MF file and creating a "org.apache.lucene.codecs.PostingsFormat" file into the "META-INF/services" directory solved the issue. Happy Holidays!

uschindler commented 9 months ago

creating a "org.apache.lucene.codecs.PostingsFormat" file into the "META-INF/services" directory solved the issue.

You shouldn't manually create those files. Use https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer for that.

Happy holidays.