Closed borgdylan closed 4 years ago
Thanks for reporting this. I will take a look.
This is a major blocker for some of my company's ETL pipelines.
I also think that the DateType.FromInternal method suffers from the same bug (inspected using ILSpy).
I have added the null checks locally and can verify that it fixes the bug.
Below, please find a git patch series (one commit)that fixes the issue:
From c8f422f3fb7de6bf3e4e0b8b7dc8a463928cf832 Mon Sep 17 00:00:00 2001
From: Dylan Borg <borgdylan@hotmail.com>
Date: Wed, 20 May 2020 11:21:51 +0200
Subject: [PATCH] Handle nulls in (DateType|TimestampType).FromInternal - fixes
#527
---
src/csharp/Microsoft.Spark/Sql/Types/SimpleTypes.cs | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/src/csharp/Microsoft.Spark/Sql/Types/SimpleTypes.cs b/src/csharp/Microsoft.Spark/Sql/Types/SimpleTypes.cs
index 7b9bd7a..c26a407 100644
--- a/src/csharp/Microsoft.Spark/Sql/Types/SimpleTypes.cs
+++ b/src/csharp/Microsoft.Spark/Sql/Types/SimpleTypes.cs
@@ -81,6 +81,12 @@ namespace Microsoft.Spark.Sql.Types
/// </summary>
internal override object FromInternal(object obj)
{
+ //null pass-through
+ if (obj == null)
+ {
+ return null;
+ }
+
return new Date(new DateTime((int)obj * TimeSpan.TicksPerDay + s_unixTimeEpoch.Ticks));
}
}
@@ -101,6 +107,12 @@ namespace Microsoft.Spark.Sql.Types
/// </summary>
internal override object FromInternal(object obj)
{
+ //null pass-through
+ if (obj == null)
+ {
+ return null;
+ }
+
// Known issue that if the original type is "long" and its value can be fit into the
// "int", Pickler will serialize the value as int.
if (obj is long val)
--
2.25.1
Thanks @borgdylan. We are also tracking this issue here: #436
@borgdylan Since you already have a fix, do you want to create a PR (with proper tests, etc.)? Otherwise, I can take care of it. Thanks.
I do not have enough time to go through the whole process as I have work to carry out. Feel free to use my patch though.
My patch can be applied using git am < file.patch
where file.patch is the file in which the patch is in. That process will apply a commit equivalent to the one I did in my local copy. Do note that you will need a shell that supports teh <
operator like bash or cmd. Powershell does not yet support it.
ping
@elvaliuliuliu do you want to work on this?
@elvaliuliuliu do you want to work on this?
Sure, I will work on this
Thanks a million!
Describe the bug When iterating over a DataFrame with nullable timestamp columns, a NullReferenceException pops up. The issue is with TimestampType.FromInternal which does not contain a null check.
To Reproduce
Steps to reproduce the behavior:
Expected behavior It is expected that .NET for Spark should support nullable timestamp columsn just like Spark itself already does.
Desktop (please complete the following information):