AlexPoint / LemmaGenerator

Generator of rule-based lemmatizers (based on examples) for serveral European languages.
GNU General Public License v2.0
29 stars 13 forks source link

Deserialization error on reading stream #2

Open fredfourie opened 9 years ago

fredfourie commented 9 years ago

Although the stream is read correctly (and lemmatization works), the following is reported on the console:

Exception when deserializing Lemmatizer: System.IO.EndOfStreamException: Unable to read beyond the end of the stream. at System.IO.MemoryStream.InternalReadInt32() at System.IO.BinaryReader.ReadInt32() at LemmaSharp.Classes.Lemmatizer.Deserialize(BinaryReader binRead)

LemmaSharp version: 4.12.5287.29676 Data file: full7z-mlteast-en.lem C#.net version 4.5

Code used:

var path = @"..\..\data\full7z-mlteast-en.lem";
var stream = File.OpenRead(path);
Lemmatizer lemmatizer =  new Lemmatizer(stream);

The last line results in the console output reported above.

AlexPoint commented 9 years ago

Your error message is weird. This exception should be caught in Lemmatizer.cs line 272 and you should see only a warning message on exceptions deserialization. Also, I tried to reproduce but without any luck. Do you have any additional info?

Thanks

Almeonamy commented 8 years ago

I have same error, but I haven't additional info.What I may give you to solve this problem? Sorry for my english if this message contains mistakes.

Almeonamy commented 8 years ago

Exception in line 262:

int num = binRead.ReadInt32();

Apparently there isn't continues data in file

AlexPoint commented 8 years ago

Hi,

Are you using the file in \data\full7z-mlteast-en.lem? If so, I'll investigate this week-end (I don't have time at the very moment).

Almeonamy commented 8 years ago

yes, this file

MirzaSikander commented 8 years ago

I am running into the same issue. Using full7z-multext-en.lem

AlexPoint commented 8 years ago

Sorry but I cannot reproduce this issue on my machine. I'm using the .lem files committed in the project and the following code:

var dataFilePath = @"..\..\data\full7z-mlteast-en.lem"; // or full7z-multext-en.lem
var stream = File.OpenRead(dataFilePath);
Lemmatizer lemmatizer = new Lemmatizer(stream);

which gives me a working Lemmatizer and just the following warning: "Couldn't deserialize exceptions in Lemmatizer file". Are you doing something differently?

pdelmundo commented 7 years ago

I'm also getting the same error, using english.lem

Raphhhhh commented 6 years ago

Hi, Same issue here with full7z-mlteast-fr.lem

BenMakesGames commented 6 years ago

just thought I'd say: I get the same thing!

Exception when deserializing Lemmatizer: System.IO.EndOfStreamException: Unable to read beyond the end of the stream.
   at System.IO.MemoryStream.InternalReadInt32()
   at System.IO.BinaryReader.ReadInt32()
   at LemmaSharp.Classes.Lemmatizer.Deserialize(BinaryReader binRead)

here's all the configuration I can possibly imagine:

code used to initialize the lemmatizer:

            ILemmatizer lemmatizer = new Lemmatizer(File.OpenRead("full7z-mlteast-en.lem"));

finally, in case it matters, my code's using statements (I currently have a single .cs file, Program.cs):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Accord.MachineLearning;
using System.IO;
using LemmaSharp;
using LemmaSharp.Classes;

(System.Text and System.Threading.Tasks aren't actually being used.)

pmahend1 commented 6 years ago

Its because of End of file bytes . EndOfStreamException Unable to read beyond the end of the stream

pmahend1 commented 6 years ago

Is there a fix ? And it also runs too slow.

pmahend1 commented 6 years ago

Solution : Use full7z-mlteast-en-modified.lem file from Test\Data\Custom folder

full7z-mlteast-en.lem has EOF bytes issue. But it still runs too slow.

oscar-o-oneill commented 3 years ago

Hi @AlexPoint, nice work on this. I'm just trying LemmaGenerator out and I also experienced this issue. The issue does not occur when I use the "full7z-mlteast-en-modified.lem" file as mentioned by @pmahend1.

When I use other files (like "english.lem" or "full7z-mlteast-en.lem") I get this error that was already reported by the above posters:

Exception when deserializing Lemmatizer: System.IO.EndOfStreamException: Unable to read beyond the end of the stream. at System.IO.MemoryStream.InternalReadInt32() at System.IO.BinaryReader.ReadInt32() at LemmaSharp.Classes.Lemmatizer.Deserialize(BinaryReader binRead)

AlexPoint commented 3 years ago

Hi @oscar-o-oneill , thanks for your message. I can't quite get my head around this bug. I'll need more time to investigate what's going on here. I'll try to get back in a few days. Anything urgent on your side?

oscar-o-oneill commented 2 years ago

Hey @AlexPoint, nothing urgent here. Did you make any progress?

Also is it possible for me to look at the dictionary that you created for "full7z-mlteast-en-modified.lem"? I am unable to inspect the contents of the file. What format is the .LEM file?

Appreciate your work on this!

AlexPoint commented 2 years ago

No chance yet @oscar-o-oneill ! But I'm not accepting defeat... ;)

The .lem file are binary files if you want to have a look at them. Check the Lemmatizer class for more information.