Audiveris / audiveris

Latest generation of Audiveris OMR engine
https://audiveris.github.io/audiveris
GNU Affero General Public License v3.0
1.62k stars 237 forks source link

audiveris cannot see tessdata #758

Open vinodkrishnanr opened 3 weeks ago

vinodkrishnanr commented 3 weeks ago

I'm using windows and vue.js

Audiveris stderr: Error opening data file C:\Program Files\tesseract-ocr\tessdata/deu.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'deu' Error opening data file C:\Program Files\tesseract-ocr\tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'eng' Error opening data file C:\Program Files\tesseract-ocr\tessdata/fra.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'fra' Tesseract couldn't load any languages!

Thats the error im getting. Its funny that it was recognizing the env variable i set C:\Program Files\tesseract-ocr\tessdata however , it amends it with a wrong '/' when looking for lang files. where should be doing '\'

Got annoyed with the TESSDATA_PREFIX , so i deleted it and then i created C:\Program Files\tesseract-ocr\tessdata as that seems to be the default directory audiveris looks for as per docs.

here's from the docs which talks about the default https://audiveris.github.io/audiveris/_pages/install/languages/

I have a Server.js setup, with a post

`app.post('/upload', upload.single('pdf'), (req, res) => {
  if (!req.file) {
    return res.status(400).json({ message: 'No file uploaded' });
  }

  let pdfPath = path.resolve(req.file.path);
  if (!pdfPath.endsWith('.pdf')) {
    const pdfPathWithExtension = pdfPath + '.pdf';
    fs.renameSync(pdfPath, pdfPathWithExtension);
    pdfPath = pdfPathWithExtension;
  }

  const outputDir = path.resolve('outputs');
  const command = `audiveris -batch -export ${pdfPath} -output ${outputDir}`;

  exec(command, (error, stdout, stderr) => {
    console.log("Audiveris stdout:", stdout);
    console.log("Audiveris stderr:", stderr);

    if (error) {
      console.error(`Error running Audiveris: ${error.message}`);
      return res.status(500).json({ message: 'Error processing PDF' });
    }

    // Assuming Audiveris generates the .mxl file with the same base name as the PDF
    const xmlOutputPath = path.join(outputDir, `${path.basename(pdfPath, '.pdf')}.mxl`);
    console.log("Expected XML output path:", xmlOutputPath); // Debugging line

    if (fs.existsSync(xmlOutputPath)) {
      res.json({ message: 'File converted successfully', xmlPath: `/outputs/${path.basename(xmlOutputPath)}` });
    } else {
      res.status(500).json({ message: 'Conversion failed, XML not created' });
    }
  });
});`

So no matter what i do, I end up with the same error, create the env variable or not. The path is wrong of course, but why is Audiveris doing that?

hbitteur commented 3 weeks ago

I don't really understand what is happening. Could you please tell us:

vinodkrishnanr commented 3 weeks ago

Apologies if it was not clear.

Where your Tesseract language files are located (the files are named like 'eng.traineddata' etc)

C:\Program Files\tesseract-ocr\tessdata

The value of your TESSDATA_PREFIX environment variable (it should point to an existing folder)

C:\Program Files\tesseract-ocr\tessdata

The content of the folder pointed by TESSDATA_PREFIX (it should contain the language files)

eng.traineddata , fra.traineddata, deu.traineddata

hbitteur commented 3 weeks ago

Your data looks OK.

Additional questions:

hbitteur commented 3 weeks ago

Could you post the first lines of the last Audiveris log file?

These log files are located (for Windows) in the folder %APPDATA%\AudiverisLtd\audiveris\log. (I suppose you have used the Windows installer for Audiveris, please correct me if I'm wrong)

More information about log files are available here and there.

This kind of file is key, because it displays important information seen from Audiveris. As an example, here is the beginning of a log file of mine:

2024-10-31 19:46:41,098 INFO  []                       CLI 281  | CLI args: [@../data/args/default.txt]
2024-10-31 19:46:41,219 INFO  []              TesseractOCR 116  | TESSDATA_PREFIX value: C:\Users\herve\AppData\Roaming\AudiverisLtd\audiveris\config\tessdata
2024-10-31 19:46:41,221 INFO  []              TesseractOCR 241  | OCR folder: C:\Users\herve\AppData\Roaming\AudiverisLtd\audiveris\config\tessdata
2024-10-31 19:46:42,750 INFO  []                      Main 402  | Environment:
- Audiveris:    5.4-alpha:82ef3255a
- OS:           Windows 10 10.0
- Architecture: amd64
- Java VM:      Java HotSpot(TM) 64-Bit Server VM (build 21.0.2+13-LTS-58, mixed mode, sharing)
- OCR Engine:   Tesseract OCR, version 5.3.1
2024-10-31 19:46:43,070 INFO  []             AliasPatterns 134  | Alias patterns: [(IMSLP[0-9]*)-.*]
2024-10-31 19:46:43,215 INFO  []                   MainGui 564  | Audiveris version 5.4-alpha
2024-10-31 19:46:43,216 INFO  []                   MainGui 565  | 
LogUtil. Property logback.configurationFile not defined, skipped.
LogUtil. Configuration found C:\Users\herve\AppData\Roaming\AudiverisLtd\audiveris\config\logback.xml
LogUtil. Logging to C:\Users\herve\AppData\Roaming\AudiverisLtd\audiveris\log\20241031T194641.log

As you can see, it displays the values of TESSDATA_PREFIX and of the OCR folder, as seen from Audiveris application. I'm insisting on this, because I'm not familiar at all with the way you launch Audiveris, and especially the role of the Server.js file you posted in a previous message.

vinodkrishnanr commented 3 weeks ago

Here's the complete log file

2024-10-31 16:04:04,025 INFO  []                       CLI 282  | CLI args: [-batch, -export, C:\vuejs\music-sheet-converter\server\uploads\0ee69b4258de8acc8e5ea2872605499f.pdf, -output, C:\vuejs\music-sheet-converter\server\outputs]
2024-10-31 16:04:04,082 INFO  []              TesseractOCR 116  | TESSDATA_PREFIX value: C:/Program Files/tesseract-ocr/tessdata/
2024-10-31 16:04:04,344 INFO  []                      Main 402  | Environment:
- Audiveris:    5.3.1:5aa4d06dc
- OS:           Windows 11 10.0
- Architecture: amd64
- Java VM:      OpenJDK 64-Bit Server VM (build 21.0.5+11-LTS, mixed mode, sharing)
- OCR Engine:   Tesseract OCR, version 5.3.1
2024-10-31 16:04:04,471 INFO  []             AliasPatterns 134  | Alias patterns: [(IMSLP[0-9]*)-.*]
2024-10-31 16:04:04,473 INFO  []                      Main 242  | Running in batch mode
2024-10-31 16:04:04,521 INFO  []                      Main 190  | Submitting 1 task(s) in sequence:
    Input "C:\vuejs\music-sheet-converter\server\uploads\0ee69b4258de8acc8e5ea2872605499f.pdf"
2024-10-31 16:04:04,588 INFO  [0ee69b4258de8acc8e5ea2872605499f]                      Book 601  | 1 sheet in C:\vuejs\music-sheet-converter\server\uploads\0ee69b4258de8acc8e5ea2872605499f.pdf
2024-10-31 16:04:04,909 INFO  []                      Book 2515 | Stored /book.xml
2024-10-31 16:04:04,910 INFO  []                      Book 2476 | Book stored as C:\vuejs\music-sheet-converter\server\outputs\0ee69b4258de8acc8e5ea2872605499f.omr
2024-10-31 16:04:04,928 INFO  []                      Book 1963 | Book reaching PAGE on sheets:[#1]
2024-10-31 16:04:04,931 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | LOAD
2024-10-31 16:04:05,284 INFO  [0ee69b4258de8acc8e5ea2872605499f]                      Book 1845 | Loaded image 1 2549x3299 from C:\vuejs\music-sheet-converter\server\uploads\0ee69b4258de8acc8e5ea2872605499f.pdf
2024-10-31 16:04:05,288 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | BINARY
2024-10-31 16:04:05,825 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | SCALE
2024-10-31 16:04:05,845 INFO  [0ee69b4258de8acc8e5ea2872605499f]              ScaleBuilder 234  | Beam  guessed height: 10 -- 0.50 of 21 interline
2024-10-31 16:04:05,846 INFO  [0ee69b4258de8acc8e5ea2872605499f]              ScaleBuilder 257  | Beam measured height: 10 -- 0.27 of [6..21] range at 245% of needed quorum
2024-10-31 16:04:05,846 INFO  [0ee69b4258de8acc8e5ea2872605499f]                 ScaleStep 65   | Scale{ interline(20,21,21) line(2,2,3) beam(10)}
2024-10-31 16:04:05,847 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | GRID
2024-10-31 16:04:06,102 INFO  [0ee69b4258de8acc8e5ea2872605499f]            LinesRetriever 1490 | Global slope: 0.00000
2024-10-31 16:04:06,119 INFO  [0ee69b4258de8acc8e5ea2872605499f]         ClustersRetriever 334  | Retrieved line clusters: 3 of sizes [5] with interline(20,21,21)
2024-10-31 16:04:06,179 INFO  [0ee69b4258de8acc8e5ea2872605499f]                 PeakGraph 310  | Systems: #1[1] #2[2] #3[3]
2024-10-31 16:04:06,402 INFO  [0ee69b4258de8acc8e5ea2872605499f]             SystemManager 149  | Indentation detected for system#1
2024-10-31 16:04:06,402 INFO  [0ee69b4258de8acc8e5ea2872605499f]             SystemManager 736  | 1 part along 3 systems
2024-10-31 16:04:06,405 INFO  [0ee69b4258de8acc8e5ea2872605499f]                      Book 584  | Created scores: [{Score 1}]
2024-10-31 16:04:06,405 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | HEADERS
2024-10-31 16:04:06,655 INFO  [0ee69b4258de8acc8e5ea2872605499f]           BasicClassifier 277  | Classifier loaded XML norms.
2024-10-31 16:04:06,659 INFO  [0ee69b4258de8acc8e5ea2872605499f]        AbstractClassifier 396  | Classifier data loaded from default uri jar:file:/C:/Program%20Files/Audiveris/lib/audiveris.jar!/res/basic-classifier.zip
2024-10-31 16:04:07,204 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | STEM_SEEDS
2024-10-31 16:04:07,295 INFO  [0ee69b4258de8acc8e5ea2872605499f]                 ImageUtil 204  | Discarding alpha band ...
2024-10-31 16:04:07,469 INFO  [0ee69b4258de8acc8e5ea2872605499f]                 ImageUtil 122  | Converting max RGB to gray
2024-10-31 16:04:07,696 INFO  [0ee69b4258de8acc8e5ea2872605499f]             StemSeedsStep 86   | stem(2 max:3)
2024-10-31 16:04:07,730 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | BEAMS
2024-10-31 16:04:10,111 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | LEDGERS
2024-10-31 16:04:10,267 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | HEADS
2024-10-31 16:04:11,897 INFO  [0ee69b4258de8acc8e5ea2872605499f]             HeadSeedTally 235  | Scale information: HeadSeeds{NOTEHEAD_BLACK[R:-0.5]}
2024-10-31 16:04:11,898 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | STEMS
2024-10-31 16:04:12,144 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | REDUCTION
2024-10-31 16:04:12,185 INFO  [0ee69b4258de8acc8e5ea2872605499f]             ReductionStep 93   | Stems free length median value: 62 pixels, 3.0 interlines
2024-10-31 16:04:12,185 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | CUE_BEAMS
2024-10-31 16:04:12,186 INFO  [0ee69b4258de8acc8e5ea2872605499f]              CueBeamsStep 81   | Step CUE_BEAMS is skipped because small heads switch is off
2024-10-31 16:04:12,186 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | TEXTS
2024-10-31 16:04:12,464 WARN  [0ee69b4258de8acc8e5ea2872605499f]            TesseractOrder 420  | Could not initialize Tesseract lang: deu+eng+fra result: -1
2024-10-31 16:04:12,466 INFO  [0ee69b4258de8acc8e5ea2872605499f]                   OcrUtil 127  | No OCR'ed lines
2024-10-31 16:04:12,479 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | MEASURES
2024-10-31 16:04:12,493 INFO  [0ee69b4258de8acc8e5ea2872605499f]                      Page 308  | 18 raw measures: [6 in system#1, 6 in system#2, 6 in system#3]
2024-10-31 16:04:12,494 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | CHORDS
2024-10-31 16:04:12,505 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | CURVES
2024-10-31 16:04:13,151 INFO  [0ee69b4258de8acc8e5ea2872605499f]              SlursBuilder 240  | Slurs: 0
2024-10-31 16:04:13,192 INFO  [0ee69b4258de8acc8e5ea2872605499f]           SegmentsBuilder 155  | Segments: 33
2024-10-31 16:04:13,200 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | SYMBOLS
2024-10-31 16:04:13,788 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | LINKS
2024-10-31 16:04:13,812 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | RHYTHMS
2024-10-31 16:04:13,856 INFO  [0ee69b4258de8acc8e5ea2872605499f]            StepMonitoring 98   | PAGE
2024-10-31 16:04:13,865 INFO  [0ee69b4258de8acc8e5ea2872605499f]                      Book 2535 | Book{0ee69b4258de8acc8e5ea2872605499f} storing
2024-10-31 16:04:13,898 INFO  [0ee69b4258de8acc8e5ea2872605499f]                      Book 2515 | Stored /book.xml
2024-10-31 16:04:13,998 INFO  [0ee69b4258de8acc8e5ea2872605499f]                DataHolder 348  | Stored /sheet#1/BINARY.png
2024-10-31 16:04:14,348 INFO  [0ee69b4258de8acc8e5ea2872605499f]                     Sheet 1529 | Stored /sheet#1/sheet#1.xml
2024-10-31 16:04:14,348 INFO  [0ee69b4258de8acc8e5ea2872605499f]                      Book 2476 | Book stored as C:\vuejs\music-sheet-converter\server\outputs\0ee69b4258de8acc8e5ea2872605499f.omr
2024-10-31 16:04:14,352 INFO  [0ee69b4258de8acc8e5ea2872605499f]                 SheetStub 1605 | Disposed sheet
2024-10-31 16:04:14,407 INFO  [0ee69b4258de8acc8e5ea2872605499f]                      Book 2035 | End of Stub#1 memory: 31,232,728
2024-10-31 16:04:14,737 INFO  [0ee69b4258de8acc8e5ea2872605499f]                 SheetStub 1044 | Loaded /sheet#1/sheet#1.xml
2024-10-31 16:04:15,065 INFO  [0ee69b4258de8acc8e5ea2872605499f]           PartwiseBuilder 2645 | Exporting sheet(s): [#1]
2024-10-31 16:04:15,114 INFO  [0ee69b4258de8acc8e5ea2872605499f]             ScoreExporter 164  | Score 0ee69b4258de8acc8e5ea2872605499f exported to C:\vuejs\music-sheet-converter\server\outputs\0ee69b4258de8acc8e5ea2872605499f.mxl
2024-10-31 16:04:15,169 INFO  [0ee69b4258de8acc8e5ea2872605499f]                     Score 187  | Closing {Score 1}
vinodkrishnanr commented 3 weeks ago

The Server.js is to run Auriveris as a batch , by getting the pdfs from the front end.

hbitteur commented 3 weeks ago

In my log, I can read:

2024-10-31 19:46:41,098 INFO  []                       CLI 281  | CLI args: [@../data/args/default.txt]
2024-10-31 19:46:41,219 INFO  []              TesseractOCR 116  | TESSDATA_PREFIX value: C:\Users\herve\AppData\Roaming\AudiverisLtd\audiveris\config\tessdata
2024-10-31 19:46:41,221 INFO  []              TesseractOCR 241  | OCR folder: C:\Users\herve\AppData\Roaming\AudiverisLtd\audiveris\config\tessdata
2024-10-31 19:46:42,750 INFO  []                      Main 402  | Environment:

In yours, just:

2024-10-31 16:04:04,025 INFO  []                       CLI 282  | CLI args: [-batch, -export, C:\vuejs\music-sheet-converter\server\uploads\0ee69b4258de8acc8e5ea2872605499f.pdf, -output, C:\vuejs\music-sheet-converter\server\outputs]
2024-10-31 16:04:04,082 INFO  []              TesseractOCR 116  | TESSDATA_PREFIX value: C:/Program Files/tesseract-ocr/tessdata/
2024-10-31 16:04:04,344 INFO  []                      Main 402  | Environment:

So, a line is missing in your log file. The line is supposed to be: "OCR folder: ..." Since you have TESSDATA_PREFIX defined, I guess the location pointed to by its value does not exist or is not a directory.

Please post the (Windows) listing of this folder.

hbitteur commented 3 weeks ago

Update: The line "OCR folder:..." appears in 5.4-alpha, not in the released 5.3.1

But I just checked in old 5.3.1 code: if the folder exists and is really a directory with the eng.traineddata files and others, it should work. The more I think about it, the more I suspect this spurious location.

vinodkrishnanr commented 3 weeks ago

The way audiveris sees the location is the problem. C:\Program Files\tesseract-ocr\tessdata/fra.traineddata

so the path is good as defined in the env variable until tessdata, and then audiveris adds / .

Here's the directory screenshot.

image

hbitteur commented 3 weeks ago

These data files are NOT the needed ones.

Here is my own listing: image

Notice the sizes of mine vs yours. For example the eng.traineddata file weighs 478KB on your side and 22917KB on mine.

Please download the correct data files, as explained in this section of Audiveris handbook.

Note: I don't think the "/" separator is wrong. This separator is needed and inserted by a call to Path.resolve(location) which is a standard Java method.

vinodkrishnanr commented 3 weeks ago

Thank you. Yes, it was the files. I did download from the same place previously, but instead of download, i was saving the link , not sure why that was causing the issue. But thank you sir, the issue is resolved

hbitteur commented 3 weeks ago

This topic of wrong datafiles is a recurring one... In version 5.4 (not released yet, but you can build it on your own), 4 language datafiles are now pre-located in Audiveris resources folder by the Windows installer.

The same problem can still occur if the user wants to manually add other languages, but it's already a step forward.

I'd like to offer a way to download languages from within the Audiveris application. However, the tessdata target folder (though writable by the Windows installer) may not be writable by the end user, unless I find a way to elevate user priviledges for this. If someone knows how to do this, I'm interested!