AnantLabs / morelinq

Automatically exported from code.google.com/p/morelinq
Apache License 2.0
0 stars 0 forks source link

Batch returns an extra empty 'bucket'. #61

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'm having an issue with the Batch() extension method.

When the windows service (an containing DLL) is deployed on the production 
server, the Batch() method is return an extra, empty 'bucket'/batch.

For example, if an array of strings: ["1","2","3","4","5","6","7","8"] has 
.Batch(2) performed (so batches of size 2), then the subsequent foreach loop 
was returning:

["1","2"]
["3","4"]
["5","6"]
["7","8"]
["",""]

I want to stress that this was ONLY occurring on the production system (we used 
the EventLog to trace the loop). When debugging in Visual Studio 2010, using 
the exact same input array, the Batch() performs correctly.

Below is a snippet of the actual code I was running. The comments have been 
added to explain some of the things.

var splitData = contentSting.Split(new char[] { ',' }, 
StringSplitOptions.RemoveEmptyEntries);

//tags is an array of either size 1 or 2, thus giving a batch size of 2 or 3
//We are skipping 7 positions as the actual data to be batched is offset by 7 
positions

foreach (var record in splitData.Skip(7).Batch(tags.Length + 1))
{
    var rArray = record.ToArray();

    //Code using the rArray goes here.
}

Apart from that there is not a whole lot more I can tell you.

Hope this helps!

Original issue reported on code.google.com by alastair...@gmail.com on 21 Dec 2010 at 10:51

GoogleCodeExporter commented 9 years ago
looking for the cause of this, I looked at the source, and took a stab at it. 
The attached patch should fix it. However, I pretty much completely rewrote it, 
which I found sort of intimidating (I hear Tony the Pony bites). I also don't 
really understand why you would want to return the buckets in a steaming 
fashion, as they will be only returned when filled anyway, so streaming will 
not reduce latency. At the risk of missing something, I still conformed to the 
current implementation. It is easily converted to a non-streaming version, 
which I have a feeling would be better.

Original comment by MartijnH...@gmail.com on 1 May 2013 at 9:33

Attachments:

GoogleCodeExporter commented 9 years ago
> The attached patch should fix it.
It should first be proven through a failing unit test that there indeed is a 
bug to be fixed. I never managed to reproduce the reported issue.

The buckets are streamed so that results are yielded as soon as they are 
available and only as many buckets are computed (their items buffered) as 
necessary.

Original comment by azizatif on 2 May 2013 at 7:36

GoogleCodeExporter commented 9 years ago
I have seen this happen in the wild (though not with this replacement), but 
havent been able to make a reproducible case, so far I have put it down as a 
jit bug. Re the streaming, I fully agree that the buckets should be streamed, 
just not their content. Is there an example of a case where the streaming 
behavior of the contents of buckets has any advantage?

Original comment by MartijnH...@gmail.com on 2 May 2013 at 8:17

GoogleCodeExporter commented 9 years ago
>>
Is there an example of a case where the streaming behavior of the contents of 
buckets has any advantage?
<<

I'm guessing you're referring to the following line?
  yield return resultSelector(bucket.Select(x => x));

The idea here is not so much streaming but to avoid giving a direct reference 
to the underlying array that is mutable. Arguably, one could render the array 
read-only with Array.AsReadOnly and it would have performance benefits for the 
downstream user but then the last bucket would have to be copied out if it 
turned out to be uneven. However that's an implementation detail that can be 
changed because it wouldn't break the contract. In the initial and current 
implementation, correctness was the first goal.

Original comment by azizatif on 2 May 2013 at 5:08

GoogleCodeExporter commented 9 years ago
well, that does have some merit. If this is indeed a JIT bug of some .NET 
version, there is probably no way there will ever be a chance to get a reliable 
test case. In that case the current implementation is at least correct, and 
this issue should be closed as irreproducible

Original comment by MartijnH...@gmail.com on 2 May 2013 at 7:18

GoogleCodeExporter commented 9 years ago
I would be comfortable with this being resolved as non-reproducible.

Original comment by alastair...@gmail.com on 2 May 2013 at 10:44

GoogleCodeExporter commented 9 years ago
@alastair.pitts, @MartijnHoekstra: thanks for the feedback

Issue closed as wont-fix. It can always be re-opened at a time when it can be 
demonstrated through a repeatable and failing test on any of the supported 
target platforms. Also considering it's been over 2 years without an 
*overwhelming* starring/voting of the issue.

Original comment by azizatif on 3 May 2013 at 6:30