brycefrank / pyfor

Tools for analyzing aerial point clouds of forest data.
MIT License
93 stars 19 forks source link

Writing a custom field -> python R portability. #42

Closed bw4sz closed 5 years ago

bw4sz commented 5 years ago

i'm 99.9% this isn't a pyfor problem, but I saw some commits, so i thought i'd check with you first.

During my segmentation pipeline I make a new column "Tree" for individual tree ID. I like the visualization tools in the R lidR package, so i went go load it up there.

In python

pc.data.points.columns
Index(['x', 'y', 'z', 'intensity', 'return_num', 'classification', 'flag_byte',
       'scan_angle_rank', 'user_data', 'pt_src_id', 'bins_x', 'bins_y',
       'Tree'],
      dtype='object')

in R, we are missing that column. I'm trying to figure out if it got written.

> colnames(r@data)
 [1] "X"                          "Y"                          "Z"                         
 [4] "gpstime"                    "Intensity"                  "ReturnNumber"              
 [7] "NumberOfReturns"            "ScanDirectionFlag"          "EdgeOfFlightline"          
[10] "Classification"             "Synthetic_flag"             "Keypoint_flag"             
[13] "Withheld_flag"              "ScanAngle"                  "UserData"                  
[16] "PointSourceID"              "R"                          "G"                         
[19] "B"                          "reversible index (lastile)"

I'm pretty sure this is the lidR package dropping the column, but let me know if i'm missing something.

Your code for writing is pretty unambiguous.

https://github.com/brycefrank/pyfor/blob/2108ac9008243842d9c90f41a5b9409a97fd22db/pyfor/cloud.py#L70

So i think pyfor is safe there. I'm gonna checkin with lidR.

bw4sz commented 5 years ago

I might have spoke too soon here. If you reload that file back into python, you don't see the column.

a=pyfor.cloud.Cloud("/Users/Ben/Desktop/test.laz")
a.data.points.columns
Index(['x', 'y', 'z', 'intensity', 'return_num', 'classification', 'flag_byte',
       'scan_angle_rank', 'user_data', 'pt_src_id'],
      dtype='object')
brycefrank commented 5 years ago

The issue is likely further up in laspy

The writer object is a a File I/O handler that is essentially a laspy object. I think the problem is it only supports writing specific columns out to the .las files, those that are specified in the various las specifications. My guess is that, even if the column is there, laspy is just simply ignoring it. I usually leverage the user_data column for things like this, but that may have information you already need.

I can't speak to how Jean-Romain handles custom columns, he tends to be a bit more involved with the particulars of reading and writing than I am (after all he did write rlas if I recall correctly, which is the R equivalent of laspy). I think the general wisdom is to stick to the specification - especially if you are writing to .las files.

Check this document, pages 10-21 or so, for different point format columns:

https://www.asprs.org/a/society/committees/standards/LAS_1_4_r13.pdf

If you find a solution that is consistent with the specifications I can take it up with the laspy folks for you. laspy may or may not handle all of these, I can't recall.

bw4sz commented 5 years ago

got it. I didn't realize there was standards. I'll stick to user_data.

On Thu, Jan 31, 2019 at 2:37 PM Bryce Frank notifications@github.com wrote:

The issue is likely further up in laspy

The writer object is a a File I/O handler that is essentially a laspy object. I think the problem is it only supports writing specific columns out to the .las files, those that are specified in the various las specifications. My guess is that, even if the column is there, laspy is just simply ignoring it. I usually leverage the user_data column for things like this, but that may have information you already need.

I can't speak to how Jean-Romain handles custom columns, he tends to be a bit more involved with the particulars of reading and writing than I am (after all he did write rlas if I recall correctly, which is the R equivalent of laspy). I think the general wisdom is to stick to the specification - especially if you are writing to .las files.

Check this document, pages 10-21 or so, for different point format columns:

https://www.asprs.org/a/society/committees/standards/LAS_1_4_r13.pdf

If you find a solution that is consistent with the specifications I can take it up with the laspy folks for you. laspy may or may not handle all of these, I can't recall.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/brycefrank/pyfor/issues/42#issuecomment-459533756, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJwrHahaCo6hGhcjeCc-_JY5H7m5-S2ks5vI3A2gaJpZM4adQzK .

-- Ben Weinstein, Ph.D. Postdoctoral Fellow University of Florida http://benweinstein.weebly.com/

bw4sz commented 5 years ago

Just posting here in case anyone comes backs and looks. I had been happily assigning trees a value in user data. But be warned, I just exported to lidR and it looks like the standards are integer from 0-255, i assume it gets scaled somewhere. Thus many individual trees have the same userdata value (there thousands of trees in a tile). I'm not sure how the community normally handle this. I can't find much online. I am working on releasing a benchmark dataset from NEON and want it to be in the most intuitive format.

all i can find is: https://www.cs.unc.edu/~isenburg/lastools/download/lasheight_README.txt

  Alternatively - to avoid quantizing and clamping - you can
  '-replace_z' the elevation value of each point with the computed
  height. That means that afterwards all ground points will have
  an elevation of zero and all other points will have an elevation
  that equals their relative height above (or below) the ground TIN
  at their x and y location. In a sense this will "normalize" the
  elevations of points in respect to their surrounding ground truth.
  If you add the '-replace_z' option the resulting heights are *not*
  scaled with a factor of 10.0, quantized & clamped into an unsigned
  char between 0 and 255, and stored in the "user data" field of each
  point ... unless you add the explicit '-store_in_user_data' option.
bw4sz commented 5 years ago

confirmed here. https://stackoverflow.com/questions/50815580/appending-an-index-to-laspy-file-las this is laspy, i'm going to see if I can write a new dim manually. i'll report back.