Closed Deduction42 closed 3 years ago
I'm trying to read a parquet file and I'm selecting some rows. In one case, I'm reading up to row 40000 from the beginning
cursor = BatchedColumnsCursor(parFile, rows=1:40000, reusebuffer=false, use_threads=false) DataFrame.( collect(cursor) )[1].Date_Time 40000-element Array{Union{Missing, DateTime},1}: 2020-10-29T01:16:34 2020-10-29T01:16:35 2020-10-29T01:16:36 2020-10-29T01:16:37 2020-10-29T01:16:38 2020-10-29T01:16:39 2020-10-29T01:16:40 2020-10-29T01:16:41 ⋮ 2020-10-29T12:23:06 2020-10-29T12:23:07 2020-10-29T12:23:08 2020-10-29T12:23:09 2020-10-29T12:23:10 2020-10-29T12:23:11 2020-10-29T12:23:12 2020-10-29T12:23:13
In another case, I start at row 20000 and read to 40000
cursor = BatchedColumnsCursor(parFile, rows=20000:40000, reusebuffer=false, use_threads=false) DataFrame.( collect(cursor) )[1].Date_Time 20001-element Array{Union{Missing, DateTime},1}: 2020-10-29T01:16:35 2020-10-29T01:16:36 2020-10-29T01:16:37 2020-10-29T01:16:38 2020-10-29T01:16:39 2020-10-29T01:16:40 2020-10-29T01:16:41 2020-10-29T01:16:42 ⋮ 2020-10-29T06:49:48 2020-10-29T06:49:49 2020-10-29T06:49:50 2020-10-29T06:49:51 2020-10-29T06:49:52 2020-10-29T06:49:53 2020-10-29T06:49:54 2020-10-29T06:49:55
The original file has timestamps in ascending order. It looks like the the 20000:40000 row reading is starting almost at the same place as the 1:40000 and they're ending up in entirely different places.
I'm trying to read a parquet file and I'm selecting some rows. In one case, I'm reading up to row 40000 from the beginning
In another case, I start at row 20000 and read to 40000
The original file has timestamps in ascending order. It looks like the the 20000:40000 row reading is starting almost at the same place as the 1:40000 and they're ending up in entirely different places.